Packages

case class GGUFRankingFinisher(uid: String) extends Transformer with DefaultParamsWritable with Product with Serializable

Finisher for AutoGGUFReranker outputs that provides ranking capabilities including top-k selection, sorting by relevance score, and score normalization.

This finisher processes the output of AutoGGUFReranker, which contains documents with relevance scores in their metadata. It provides several options for post-processing:

  • Top-k selection: Select only the top k documents by relevance score
  • Score thresholding: Filter documents by minimum relevance score
  • Min-max scaling: Normalize relevance scores to 0-1 range
  • Sorting: Sort documents by relevance score in descending order
  • Ranking: Add rank information to document metadata

The finisher preserves the document annotation structure while adding ranking information to the metadata and optionally filtering/sorting the documents.

Example

import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotators._
import com.johnsnowlabs.nlp.finisher._
import org.apache.spark.ml.Pipeline
import spark.implicits._

val document = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val reranker = AutoGGUFReranker
  .pretrained("bge_reranker_v2_m3-Q4_K_M")
  .setInputCols("document")
  .setOutputCol("reranked_documents")
  .setQuery("A man is eating pasta.")

val finisher = new GGUFRankingFinisher()
  .setInputCols("reranked_documents")
  .setOutputCol("ranked_documents")
  .setTopK(3)
  .setMinRelevanceScore(0.1)
  .setMinMaxScaling(true)

val pipeline = new Pipeline().setStages(Array(document, reranker, finisher))

val data = Seq(
  "A man is eating food.",
  "A man is eating a piece of bread.",
  "The girl is carrying a baby.",
  "A man is riding a horse."
).toDF("text")

val result = pipeline.fit(data).transform(data)
result.select("ranked_documents").show(truncate = false)
// Documents will be sorted by relevance with rank information in metadata
uid

required uid for storing finisher to disk

Linear Supertypes
Product, Equals, DefaultParamsWritable, MLWritable, Transformer, PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. GGUFRankingFinisher
  2. Product
  3. Equals
  4. DefaultParamsWritable
  5. MLWritable
  6. Transformer
  7. PipelineStage
  8. Logging
  9. Params
  10. Serializable
  11. Serializable
  12. Identifiable
  13. AnyRef
  14. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new GGUFRankingFinisher()
  2. new GGUFRankingFinisher(uid: String)

    uid

    required uid for storing finisher to disk

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T
    Attributes
    protected
    Definition Classes
    Params
  4. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  5. val QUERY_COL_NAME: String
  6. val RANK_COL_NAME: String
  7. val RELEVANCE_SCORE_COL_NAME: String
  8. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  9. final def clear(param: Param[_]): GGUFRankingFinisher.this.type
    Definition Classes
    Params
  10. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  11. def copy(extra: ParamMap): Transformer
    Definition Classes
    GGUFRankingFinisher → Transformer → PipelineStage → Params
  12. def copyValues[T <: Params](to: T, extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  13. final def defaultCopy[T <: Params](extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  14. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  15. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  16. def explainParams(): String
    Definition Classes
    Params
  17. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  18. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  19. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  20. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  21. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  22. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  23. def getInputCols: Array[String]

    Name of input annotation cols containing reranked documents

  24. def getMinMaxScaling: Boolean

    Get whether to apply min-max scaling

  25. def getMinRelevanceScore: Double

    Get minimum relevance score threshold

  26. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  27. def getOutputCol: String

    Name of output annotation column containing ranked documents

  28. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  29. def getTopK: Int

    Get maximum number of top documents to return

  30. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  31. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  32. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  33. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  34. val inputCols: StringArrayParam

    Name of input annotation cols containing reranked documents

  35. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  36. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  37. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  38. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  39. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  40. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  41. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  42. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  43. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  44. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  45. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  46. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  47. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  48. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  49. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  50. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  51. val minMaxScaling: BooleanParam

    Whether to apply min-max scaling to normalize relevance scores to 0-1 range

  52. val minRelevanceScore: DoubleParam

    Minimum relevance score threshold for filtering documents

  53. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  54. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  55. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  56. val outputCol: StringArrayParam

    Name of output annotation column containing ranked documents

  57. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  58. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  59. final def set(paramPair: ParamPair[_]): GGUFRankingFinisher.this.type
    Attributes
    protected
    Definition Classes
    Params
  60. final def set(param: String, value: Any): GGUFRankingFinisher.this.type
    Attributes
    protected
    Definition Classes
    Params
  61. final def set[T](param: Param[T], value: T): GGUFRankingFinisher.this.type
    Definition Classes
    Params
  62. final def setDefault(paramPairs: ParamPair[_]*): GGUFRankingFinisher.this.type
    Attributes
    protected
    Definition Classes
    Params
  63. final def setDefault[T](param: Param[T], value: T): GGUFRankingFinisher.this.type
    Attributes
    protected[org.apache.spark.ml]
    Definition Classes
    Params
  64. def setInputCols(value: String*): GGUFRankingFinisher.this.type

    Name of input annotation cols containing reranked documents

  65. def setInputCols(value: Array[String]): GGUFRankingFinisher.this.type

    Name of input annotation cols containing reranked documents

  66. def setMinMaxScaling(value: Boolean): GGUFRankingFinisher.this.type

    Set whether to apply min-max scaling

  67. def setMinRelevanceScore(value: Double): GGUFRankingFinisher.this.type

    Set minimum relevance score threshold

  68. def setOutputCol(value: String): GGUFRankingFinisher.this.type

    Name of output annotation column containing ranked documents

  69. def setTopK(value: Int): GGUFRankingFinisher.this.type

    Set maximum number of top documents to return

  70. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  71. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  72. val topK: IntParam

    Maximum number of top documents to return based on relevance score

  73. def transform(dataset: Dataset[_]): DataFrame
    Definition Classes
    GGUFRankingFinisher → Transformer
  74. def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" )
  75. def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" ) @varargs()
  76. def transformSchema(schema: StructType): StructType
    Definition Classes
    GGUFRankingFinisher → PipelineStage
  77. def transformSchema(schema: StructType, logging: Boolean): StructType
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  78. val uid: String
    Definition Classes
    GGUFRankingFinisher → Identifiable
  79. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  80. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  81. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  82. def write: MLWriter
    Definition Classes
    DefaultParamsWritable → MLWritable

Inherited from Product

Inherited from Equals

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from Transformer

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

A list of (hyper-)parameter keys this finisher can take. Users can set and get the parameter values through setters and getters, respectively.

Members

Parameter setters

Parameter getters