sparknlp.base.gguf_ranking_finisher
#
Contains classes for the GGUFRankingFinisher.
Module Contents#
Classes#
Finisher for AutoGGUFReranker outputs that provides ranking capabilities |
- class GGUFRankingFinisher[source]#
Finisher for AutoGGUFReranker outputs that provides ranking capabilities including top-k selection, sorting by relevance score, and score normalization.
This finisher processes the output of AutoGGUFReranker, which contains documents with relevance scores in their metadata. It provides several options for post-processing:
Top-k selection: Select only the top k documents by relevance score
Score thresholding: Filter documents by minimum relevance score
Min-max scaling: Normalize relevance scores to 0-1 range
Sorting: Sort documents by relevance score in descending order
Ranking: Add rank information to document metadata
The finisher preserves the document annotation structure while adding ranking information to the metadata and optionally filtering/sorting the documents.
For extended examples of usage, see the Examples.
Input Annotation types
Output Annotation type
DOCUMENT
DOCUMENT
- Parameters:
- inputCols
Name of input annotation columns containing reranked documents
- outputCol
Name of output annotation column containing ranked documents, by default “ranked_documents”
- topK
Maximum number of top documents to return based on relevance score (-1 for no limit), by default -1
- minRelevanceScore
Minimum relevance score threshold for filtering documents, by default Double.MinValue
- minMaxScaling
Whether to apply min-max scaling to normalize relevance scores to 0-1 range, by default False
Examples
>>> import sparknlp >>> from sparknlp.base import * >>> from sparknlp.annotator import * >>> from pyspark.ml import Pipeline >>> documentAssembler = DocumentAssembler() \ ... .setInputCol("text") \ ... .setOutputCol("document") >>> reranker = AutoGGUFReranker.pretrained() \ ... .setInputCols("document") \ ... .setOutputCol("reranked_documents") \ ... .setQuery("A man is eating pasta.") >>> finisher = GGUFRankingFinisher() \ ... .setInputCols("reranked_documents") \ ... .setOutputCol("ranked_documents") \ ... .setTopK(3) \ ... .setMinMaxScaling(True) >>> pipeline = Pipeline().setStages([documentAssembler, reranker, finisher]) >>> data = spark.createDataFrame([ ... ("A man is eating food.",), ... ("A man is eating a piece of bread.",), ... ("The girl is carrying a baby.",), ... ("A man is riding a horse.",) ... ], ["text"]) >>> result = pipeline.fit(data).transform(data) >>> result.select("ranked_documents").show(truncate=False) # Documents will be sorted by relevance with rank information in metadata
- setInputCols(*value)[source]#
Sets input annotation column names.
- Parameters:
- valueList[str]
Input annotation column names containing reranked documents
- getInputCols()[source]#
Gets input annotation column names.
- Returns:
- List[str]
Input annotation column names
- setOutputCol(value)[source]#
Sets output annotation column name.
- Parameters:
- valuestr
Output annotation column name
- getOutputCol()[source]#
Gets output annotation column name.
- Returns:
- str
Output annotation column name
- setTopK(value)[source]#
Sets maximum number of top documents to return.
- Parameters:
- valueint
Maximum number of top documents to return (-1 for no limit)
- getTopK()[source]#
Gets maximum number of top documents to return.
- Returns:
- int
Maximum number of top documents to return
- setMinRelevanceScore(value)[source]#
Sets minimum relevance score threshold.
- Parameters:
- valuefloat
Minimum relevance score threshold
- getMinRelevanceScore()[source]#
Gets minimum relevance score threshold.
- Returns:
- float
Minimum relevance score threshold