sparknlp.base.gguf_ranking_finisher#

Contains classes for the GGUFRankingFinisher.

Module Contents#

Classes#

GGUFRankingFinisher

Finisher for AutoGGUFReranker outputs that provides ranking capabilities

class GGUFRankingFinisher[source]#

Finisher for AutoGGUFReranker outputs that provides ranking capabilities including top-k selection, sorting by relevance score, and score normalization.

This finisher processes the output of AutoGGUFReranker, which contains documents with relevance scores in their metadata. It provides several options for post-processing:

  • Top-k selection: Select only the top k documents by relevance score

  • Score thresholding: Filter documents by minimum relevance score

  • Min-max scaling: Normalize relevance scores to 0-1 range

  • Sorting: Sort documents by relevance score in descending order

  • Ranking: Add rank information to document metadata

The finisher preserves the document annotation structure while adding ranking information to the metadata and optionally filtering/sorting the documents.

For extended examples of usage, see the Examples.

Input Annotation types

Output Annotation type

DOCUMENT

DOCUMENT

Parameters:
inputCols

Name of input annotation columns containing reranked documents

outputCol

Name of output annotation column containing ranked documents, by default “ranked_documents”

topK

Maximum number of top documents to return based on relevance score (-1 for no limit), by default -1

minRelevanceScore

Minimum relevance score threshold for filtering documents, by default Double.MinValue

minMaxScaling

Whether to apply min-max scaling to normalize relevance scores to 0-1 range, by default False

Examples

>>> import sparknlp
>>> from sparknlp.base import *
>>> from sparknlp.annotator import *
>>> from pyspark.ml import Pipeline
>>> documentAssembler = DocumentAssembler() \
...     .setInputCol("text") \
...     .setOutputCol("document")
>>> reranker = AutoGGUFReranker.pretrained() \
...     .setInputCols("document") \
...     .setOutputCol("reranked_documents") \
...     .setQuery("A man is eating pasta.")
>>> finisher = GGUFRankingFinisher() \
...     .setInputCols("reranked_documents") \
...     .setOutputCol("ranked_documents") \
...     .setTopK(3) \
...     .setMinMaxScaling(True)
>>> pipeline = Pipeline().setStages([documentAssembler, reranker, finisher])
>>> data = spark.createDataFrame([
...     ("A man is eating food.",),
...     ("A man is eating a piece of bread.",),
...     ("The girl is carrying a baby.",),
...     ("A man is riding a horse.",)
... ], ["text"])
>>> result = pipeline.fit(data).transform(data)
>>> result.select("ranked_documents").show(truncate=False)
# Documents will be sorted by relevance with rank information in metadata
name = 'GGUFRankingFinisher'[source]#
inputCols[source]#
outputCol[source]#
topK[source]#
minRelevanceScore[source]#
minMaxScaling[source]#
setParams()[source]#
setInputCols(*value)[source]#

Sets input annotation column names.

Parameters:
valueList[str]

Input annotation column names containing reranked documents

getInputCols()[source]#

Gets input annotation column names.

Returns:
List[str]

Input annotation column names

setOutputCol(value)[source]#

Sets output annotation column name.

Parameters:
valuestr

Output annotation column name

getOutputCol()[source]#

Gets output annotation column name.

Returns:
str

Output annotation column name

setTopK(value)[source]#

Sets maximum number of top documents to return.

Parameters:
valueint

Maximum number of top documents to return (-1 for no limit)

getTopK()[source]#

Gets maximum number of top documents to return.

Returns:
int

Maximum number of top documents to return

setMinRelevanceScore(value)[source]#

Sets minimum relevance score threshold.

Parameters:
valuefloat

Minimum relevance score threshold

getMinRelevanceScore()[source]#

Gets minimum relevance score threshold.

Returns:
float

Minimum relevance score threshold

setMinMaxScaling(value)[source]#

Sets whether to apply min-max scaling.

Parameters:
valuebool

Whether to apply min-max scaling to normalize scores

getMinMaxScaling()[source]#

Gets whether to apply min-max scaling.

Returns:
bool

Whether min-max scaling is enabled