`sparknlp.base.gguf_ranking_finisher`#

Contains classes for the GGUFRankingFinisher.

Module Contents#

Classes#

GGUFRankingFinisher

Finisher for AutoGGUFReranker outputs that provides ranking capabilities

class GGUFRankingFinisher[source]#

Finisher for AutoGGUFReranker outputs that provides ranking capabilities including top-k selection, sorting by relevance score, and score normalization.

This finisher processes the output of AutoGGUFReranker, which contains documents with relevance scores in their metadata. It provides several options for post-processing:

Top-k selection: Select only the top k documents by relevance score
Score thresholding: Filter documents by minimum relevance score
Min-max scaling: Normalize relevance scores to 0-1 range
Sorting: Sort documents by relevance score in descending order
Ranking: Add rank information to document metadata

The finisher preserves the document annotation structure while adding ranking information to the metadata and optionally filtering/sorting the documents.

For extended examples of usage, see the Examples.

Input Annotation types	Output Annotation type
`DOCUMENT`	`DOCUMENT`

Parameters:

inputCols: Name of input annotation columns containing reranked documents
outputCol: Name of output annotation column containing ranked documents, by default “ranked_documents”
topK: Maximum number of top documents to return based on relevance score (-1 for no limit), by default -1
minRelevanceScore: Minimum relevance score threshold for filtering documents, by default Double.MinValue
minMaxScaling: Whether to apply min-max scaling to normalize relevance scores to 0-1 range, by default False

Examples

>>> import sparknlp
>>> from sparknlp.base import *
>>> from sparknlp.annotator import *
>>> from pyspark.ml import Pipeline
>>> documentAssembler = DocumentAssembler() \
...     .setInputCol("text") \
...     .setOutputCol("document")
>>> reranker = AutoGGUFReranker.pretrained() \
...     .setInputCols("document") \
...     .setOutputCol("reranked_documents") \
...     .setQuery("A man is eating pasta.")
>>> finisher = GGUFRankingFinisher() \
...     .setInputCols("reranked_documents") \
...     .setOutputCol("ranked_documents") \
...     .setTopK(3) \
...     .setMinMaxScaling(True)
>>> pipeline = Pipeline().setStages([documentAssembler, reranker, finisher])
>>> data = spark.createDataFrame([
...     ("A man is eating food.",),
...     ("A man is eating a piece of bread.",),
...     ("The girl is carrying a baby.",),
...     ("A man is riding a horse.",)
... ], ["text"])
>>> result = pipeline.fit(data).transform(data)
>>> result.select("ranked_documents").show(truncate=False)
# Documents will be sorted by relevance with rank information in metadata

name = 'GGUFRankingFinisher'[source]#

inputCols[source]#

outputCol[source]#

topK[source]#

minRelevanceScore[source]#

minMaxScaling[source]#

setParams()[source]#

setInputCols(*value)[source]#

Sets input annotation column names.

Parameters:

valueList[str]: Input annotation column names containing reranked documents

getInputCols()[source]#

Gets input annotation column names.

Returns:

List[str]: Input annotation column names

setOutputCol(value)[source]#

Sets output annotation column name.

Parameters:

valuestr: Output annotation column name

getOutputCol()[source]#

Gets output annotation column name.

Returns:

str: Output annotation column name

setTopK(value)[source]#

Sets maximum number of top documents to return.

Parameters:

valueint: Maximum number of top documents to return (-1 for no limit)

getTopK()[source]#

Gets maximum number of top documents to return.

Returns:

int: Maximum number of top documents to return

setMinRelevanceScore(value)[source]#

Sets minimum relevance score threshold.

Parameters:

valuefloat: Minimum relevance score threshold

getMinRelevanceScore()[source]#

Gets minimum relevance score threshold.

Returns:

float: Minimum relevance score threshold

setMinMaxScaling(value)[source]#

Sets whether to apply min-max scaling.

Parameters:

valuebool: Whether to apply min-max scaling to normalize scores

getMinMaxScaling()[source]#

Gets whether to apply min-max scaling.

Returns:

bool: Whether min-max scaling is enabled

sparknlp.base.gguf_ranking_finisher#

Module Contents#

Classes#

`sparknlp.base.gguf_ranking_finisher`#