Packages

package finisher

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. case class DocumentSimilarityRankerFinisher(uid: String) extends Transformer with DefaultParamsWritable with Product with Serializable
  2. case class GGUFRankingFinisher(uid: String) extends Transformer with DefaultParamsWritable with Product with Serializable

    Finisher for AutoGGUFReranker outputs that provides ranking capabilities including top-k selection, sorting by relevance score, and score normalization.

    Finisher for AutoGGUFReranker outputs that provides ranking capabilities including top-k selection, sorting by relevance score, and score normalization.

    This finisher processes the output of AutoGGUFReranker, which contains documents with relevance scores in their metadata. It provides several options for post-processing:

    • Top-k selection: Select only the top k documents by relevance score
    • Score thresholding: Filter documents by minimum relevance score
    • Min-max scaling: Normalize relevance scores to 0-1 range
    • Sorting: Sort documents by relevance score in descending order
    • Ranking: Add rank information to document metadata

    The finisher preserves the document annotation structure while adding ranking information to the metadata and optionally filtering/sorting the documents.

    Example

    import com.johnsnowlabs.nlp.base._
    import com.johnsnowlabs.nlp.annotators._
    import com.johnsnowlabs.nlp.finisher._
    import org.apache.spark.ml.Pipeline
    import spark.implicits._
    
    val document = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val reranker = AutoGGUFReranker
      .pretrained("bge_reranker_v2_m3-Q4_K_M")
      .setInputCols("document")
      .setOutputCol("reranked_documents")
      .setQuery("A man is eating pasta.")
    
    val finisher = new GGUFRankingFinisher()
      .setInputCols("reranked_documents")
      .setOutputCol("ranked_documents")
      .setTopK(3)
      .setMinRelevanceScore(0.1)
      .setMinMaxScaling(true)
    
    val pipeline = new Pipeline().setStages(Array(document, reranker, finisher))
    
    val data = Seq(
      "A man is eating food.",
      "A man is eating a piece of bread.",
      "The girl is carrying a baby.",
      "A man is riding a horse."
    ).toDF("text")
    
    val result = pipeline.fit(data).transform(data)
    result.select("ranked_documents").show(truncate = false)
    // Documents will be sorted by relevance with rank information in metadata
    uid

    required uid for storing finisher to disk

Value Members

  1. object DocumentSimilarityRankerFinisher extends DefaultParamsReadable[DocumentSimilarityRankerFinisher] with Serializable
  2. object GGUFRankingFinisher extends DefaultParamsReadable[GGUFRankingFinisher] with Serializable

Ungrouped