finisher

package finisher

Ordering

Alphabetic

Visibility

Public
All

Type Members

case class DocumentSimilarityRankerFinisher(uid: String) extends Transformer with DefaultParamsWritable with Product with Serializable

case class GGUFRankingFinisher(uid: String) extends Transformer with DefaultParamsWritable with Product with Serializable

Finisher for AutoGGUFReranker outputs that provides ranking capabilities including top-k selection, sorting by relevance score, and score normalization.

This finisher processes the output of AutoGGUFReranker, which contains documents with relevance scores in their metadata. It provides several options for post-processing:

Top-k selection: Select only the top k documents by relevance score
Score thresholding: Filter documents by minimum relevance score
Min-max scaling: Normalize relevance scores to 0-1 range
Sorting: Sort documents by relevance score in descending order
Ranking: Add rank information to document metadata

The finisher preserves the document annotation structure while adding ranking information to the metadata and optionally filtering/sorting the documents.

Example

import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotators._
import com.johnsnowlabs.nlp.finisher._
import org.apache.spark.ml.Pipeline
import spark.implicits._

val document = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val reranker = AutoGGUFReranker
  .pretrained("bge_reranker_v2_m3-Q4_K_M")
  .setInputCols("document")
  .setOutputCol("reranked_documents")
  .setQuery("A man is eating pasta.")

val finisher = new GGUFRankingFinisher()
  .setInputCols("reranked_documents")
  .setOutputCol("ranked_documents")
  .setTopK(3)
  .setMinRelevanceScore(0.1)
  .setMinMaxScaling(true)

val pipeline = new Pipeline().setStages(Array(document, reranker, finisher))

val data = Seq(
  "A man is eating food.",
  "A man is eating a piece of bread.",
  "The girl is carrying a baby.",
  "A man is riding a horse."
).toDF("text")

val result = pipeline.fit(data).transform(data)
result.select("ranked_documents").show(truncate = false)
// Documents will be sorted by relevance with rank information in metadata

uid: required uid for storing finisher to disk

Value Members

object DocumentSimilarityRankerFinisher extends DefaultParamsReadable[DocumentSimilarityRankerFinisher] with Serializable
object GGUFRankingFinisher extends DefaultParamsReadable[GGUFRankingFinisher] with Serializable

Packages

finisher

package finisher

Type Members

Example

Value Members

Ungrouped

Packages

finisher 

package finisher

Type Members

Example

Value Members

Ungrouped

finisher