Packages

class BertForSequenceClassification extends AnnotatorModel[BertForSequenceClassification] with HasBatchedAnnotate[BertForSequenceClassification] with WriteTensorflowModel with WriteOnnxModel with WriteOpenvinoModel with HasCaseSensitiveProperties with HasClassifierActivationProperties with HasEngine

BertForSequenceClassification can load Bert Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.

Pretrained models can be loaded with pretrained of the companion object:

val sequenceClassifier = BertForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")

The default model is "bert_base_sequence_classifier_imdb", if no name is provided.

For available pretrained models please see the Models Hub.

To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see BertForSequenceClassificationTestSpec.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val sequenceClassifier = BertForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  sequenceClassifier
))

val data = Seq("I loved this movie when I was a child.", "It was pretty boring.").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+------+
|result|
+------+
|[pos] |
|[neg] |
+------+
See also

BertForSequenceClassification for sequence-level classification

Annotators Main Page for a list of transformer based classifiers

Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. BertForSequenceClassification
  2. HasEngine
  3. HasClassifierActivationProperties
  4. HasCaseSensitiveProperties
  5. WriteOpenvinoModel
  6. WriteOnnxModel
  7. WriteTensorflowModel
  8. HasBatchedAnnotate
  9. AnnotatorModel
  10. CanBeLazy
  11. RawAnnotator
  12. HasOutputAnnotationCol
  13. HasInputAnnotationCols
  14. HasOutputAnnotatorType
  15. ParamsAndFeaturesWritable
  16. HasFeatures
  17. DefaultParamsWritable
  18. MLWritable
  19. Model
  20. Transformer
  21. PipelineStage
  22. Logging
  23. Params
  24. Serializable
  25. Serializable
  26. Identifiable
  27. AnyRef
  28. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Parameters

A list of (hyper-)parameter keys this annotator can take. Users can set and get the parameter values through setters and getters, respectively.

  1. val activation: Param[String]

    Whether to enable caching DataFrames or RDDs during the training (Default depends on model).

    Whether to enable caching DataFrames or RDDs during the training (Default depends on model).

    Definition Classes
    HasClassifierActivationProperties
  2. val batchSize: IntParam

    Size of every batch (Default depends on model).

    Size of every batch (Default depends on model).

    Definition Classes
    HasBatchedAnnotate
  3. val caseSensitive: BooleanParam

    Whether to ignore case in index lookups (Default depends on model)

    Whether to ignore case in index lookups (Default depends on model)

    Definition Classes
    HasCaseSensitiveProperties
  4. val coalesceSentences: BooleanParam

    Instead of 1 class per sentence (if inputCols is sentence) output 1 class per document by averaging probabilities in all sentences (Default: false).

    Instead of 1 class per sentence (if inputCols is sentence) output 1 class per document by averaging probabilities in all sentences (Default: false).

    Due to max sequence length limit in almost all transformer models such as BERT (512 tokens), this parameter helps feeding all the sentences into the model and averaging all the probabilities for the entire document instead of probabilities per sentence.

  5. val configProtoBytes: IntArrayParam

    ConfigProto from tensorflow, serialized into byte array.

    ConfigProto from tensorflow, serialized into byte array. Get with config_proto.SerializeToString()

  6. val engine: Param[String]

    This param is set internally once via loadSavedModel.

    This param is set internally once via loadSavedModel. That's why there is no setter

    Definition Classes
    HasEngine
  7. val labels: MapFeature[String, Int]

    Labels used to decode predicted IDs back to string tags

  8. val maxSentenceLength: IntParam

    Max sentence length to process (Default: 128)

  9. val multilabel: BooleanParam

    Whether or not the result should be multi-class (the sum of all probabilities is 1.0) or multi-label (each label has a probability between 0.0 to 1.0).

    Whether or not the result should be multi-class (the sum of all probabilities is 1.0) or multi-label (each label has a probability between 0.0 to 1.0). Default is False i.e. multi-class

    Definition Classes
    HasClassifierActivationProperties
  10. def setMultilabel(value: Boolean): BertForSequenceClassification.this.type

    Set whether or not the result should be multi-class (the sum of all probabilities is 1.0) or multi-label (each label has a probability between 0.0 to 1.0).

    Set whether or not the result should be multi-class (the sum of all probabilities is 1.0) or multi-label (each label has a probability between 0.0 to 1.0). Default is False i.e. multi-class

    Definition Classes
    HasClassifierActivationProperties
  11. def setThreshold(threshold: Float): BertForSequenceClassification.this.type

    Choose the threshold to determine which logits are considered to be positive or negative.

    Choose the threshold to determine which logits are considered to be positive or negative. (Default: 0.5f). The value should be between 0.0 and 1.0. Changing the threshold value will affect the resulting labels and can be used to adjust the balance between precision and recall in the classification process.

    Definition Classes
    HasClassifierActivationProperties
  12. val signatures: MapFeature[String, String]

    It contains TF model signatures for the laded saved model

  13. val threshold: FloatParam

    Choose the threshold to determine which logits are considered to be positive or negative.

    Choose the threshold to determine which logits are considered to be positive or negative. (Default: 0.5f). The value should be between 0.0 and 1.0. Changing the threshold value will affect the resulting labels and can be used to adjust the balance between precision and recall in the classification process.

    Definition Classes
    HasClassifierActivationProperties
  14. val vocabulary: MapFeature[String, Int]

    Vocabulary used to encode the words to ids with WordPieceEncoder

Annotator types

Required input and expected output annotator types

  1. val inputAnnotatorTypes: Array[String]

    Input Annotator Types: DOCUMENT, TOKEN

    Input Annotator Types: DOCUMENT, TOKEN

    Definition Classes
    BertForSequenceClassificationHasInputAnnotationCols
  2. val outputAnnotatorType: AnnotatorType

    Output Annotator Types: CATEGORY

    Output Annotator Types: CATEGORY

    Definition Classes
    BertForSequenceClassificationHasOutputAnnotatorType

Members

  1. type AnnotatorType = String
    Definition Classes
    HasOutputAnnotatorType
  1. def batchAnnotate(batchedAnnotations: Seq[Array[Annotation]]): Seq[Seq[Annotation]]

    takes a document and annotations and produces new annotations of this annotator's annotation type

    takes a document and annotations and produces new annotations of this annotator's annotation type

    batchedAnnotations

    Annotations that correspond to inputAnnotationCols generated by previous annotators if any

    returns

    any number of annotations processed for every input annotation. Not necessary one to one relationship

    Definition Classes
    BertForSequenceClassificationHasBatchedAnnotate
  2. def batchProcess(rows: Iterator[_]): Iterator[Row]
    Definition Classes
    HasBatchedAnnotate
  3. final def clear(param: Param[_]): BertForSequenceClassification.this.type
    Definition Classes
    Params
  4. def copy(extra: ParamMap): BertForSequenceClassification

    requirement for annotators copies

    requirement for annotators copies

    Definition Classes
    RawAnnotator → Model → Transformer → PipelineStage → Params
  5. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  6. def explainParams(): String
    Definition Classes
    Params
  7. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  8. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  9. val features: ArrayBuffer[Feature[_, _, _]]
    Definition Classes
    HasFeatures
  10. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  11. def getClasses: Array[String]

    Returns labels used to train this model

  12. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  13. def getInputCols: Array[String]

    returns

    input annotations columns currently used

    Definition Classes
    HasInputAnnotationCols
  14. def getLazyAnnotator: Boolean
    Definition Classes
    CanBeLazy
  15. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  16. final def getOutputCol: String

    Gets annotation column name going to generate

    Gets annotation column name going to generate

    Definition Classes
    HasOutputAnnotationCol
  17. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  18. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  19. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  20. def hasParent: Boolean
    Definition Classes
    Model
  21. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  22. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  23. val lazyAnnotator: BooleanParam
    Definition Classes
    CanBeLazy
  24. def onWrite(path: String, spark: SparkSession): Unit
  25. val optionalInputAnnotatorTypes: Array[String]
    Definition Classes
    HasInputAnnotationCols
  26. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  27. var parent: Estimator[BertForSequenceClassification]
    Definition Classes
    Model
  28. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  29. final def set[T](param: Param[T], value: T): BertForSequenceClassification.this.type
    Definition Classes
    Params
  30. final def setInputCols(value: String*): BertForSequenceClassification.this.type
    Definition Classes
    HasInputAnnotationCols
  31. def setInputCols(value: Array[String]): BertForSequenceClassification.this.type

    Overrides required annotators column if different than default

    Overrides required annotators column if different than default

    Definition Classes
    HasInputAnnotationCols
  32. def setLazyAnnotator(value: Boolean): BertForSequenceClassification.this.type
    Definition Classes
    CanBeLazy
  33. final def setOutputCol(value: String): BertForSequenceClassification.this.type

    Overrides annotation column name when transforming

    Overrides annotation column name when transforming

    Definition Classes
    HasOutputAnnotationCol
  34. def setParent(parent: Estimator[BertForSequenceClassification]): BertForSequenceClassification
    Definition Classes
    Model
  35. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  36. final def transform(dataset: Dataset[_]): DataFrame

    Given requirements are met, this applies ML transformation within a Pipeline or stand-alone Output annotation will be generated as a new column, previous annotations are still available separately metadata is built at schema level to record annotations structural information outside its content

    Given requirements are met, this applies ML transformation within a Pipeline or stand-alone Output annotation will be generated as a new column, previous annotations are still available separately metadata is built at schema level to record annotations structural information outside its content

    dataset

    Dataset[Row]

    Definition Classes
    AnnotatorModel → Transformer
  37. def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" )
  38. def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" ) @varargs()
  39. final def transformSchema(schema: StructType): StructType

    requirement for pipeline transformation validation.

    requirement for pipeline transformation validation. It is called on fit()

    Definition Classes
    RawAnnotator → PipelineStage
  40. val uid: String
    Definition Classes
    BertForSequenceClassification → Identifiable
  41. def write: MLWriter
    Definition Classes
    ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable
  42. def writeOnnxModel(path: String, spark: SparkSession, onnxWrapper: OnnxWrapper, suffix: String, fileName: String): Unit
    Definition Classes
    WriteOnnxModel
  43. def writeOnnxModels(path: String, spark: SparkSession, onnxWrappersWithNames: Seq[(OnnxWrapper, String)], suffix: String): Unit
    Definition Classes
    WriteOnnxModel
  44. def writeOpenvinoModel(path: String, spark: SparkSession, openvinoWrapper: OpenvinoWrapper, suffix: String, fileName: String): Unit
    Definition Classes
    WriteOpenvinoModel
  45. def writeOpenvinoModels(path: String, spark: SparkSession, ovWrappersWithNames: Seq[(OpenvinoWrapper, String)], suffix: String): Unit
    Definition Classes
    WriteOpenvinoModel
  46. def writeTensorflowHub(path: String, tfPath: String, spark: SparkSession, suffix: String = "_use"): Unit
    Definition Classes
    WriteTensorflowModel
  47. def writeTensorflowModel(path: String, spark: SparkSession, tensorflow: TensorflowWrapper, suffix: String, filename: String, configProtoBytes: Option[Array[Byte]] = None): Unit
    Definition Classes
    WriteTensorflowModel
  48. def writeTensorflowModelV2(path: String, spark: SparkSession, tensorflow: TensorflowWrapper, suffix: String, filename: String, configProtoBytes: Option[Array[Byte]] = None, savedSignatures: Option[Map[String, String]] = None): Unit
    Definition Classes
    WriteTensorflowModel

Parameter setters

  1. def sentenceEndTokenId: Int

  2. def sentenceStartTokenId: Int

  3. def setActivation(value: String): BertForSequenceClassification.this.type

  4. def setBatchSize(size: Int): BertForSequenceClassification.this.type

    Size of every batch.

    Size of every batch.

    Definition Classes
    HasBatchedAnnotate
  5. def setCaseSensitive(value: Boolean): BertForSequenceClassification.this.type

    Whether to lowercase tokens or not (Default: true).

    Whether to lowercase tokens or not (Default: true).

    Definition Classes
    BertForSequenceClassificationHasCaseSensitiveProperties
  6. def setCoalesceSentences(value: Boolean): BertForSequenceClassification.this.type

  7. def setConfigProtoBytes(bytes: Array[Int]): BertForSequenceClassification.this.type

  8. def setLabels(value: Map[String, Int]): BertForSequenceClassification.this.type

  9. def setMaxSentenceLength(value: Int): BertForSequenceClassification.this.type

  10. def setModelIfNotSet(spark: SparkSession, tensorflowWrapper: Option[TensorflowWrapper], onnxWrapper: Option[OnnxWrapper], openvinoWrapper: Option[OpenvinoWrapper]): BertForSequenceClassification

  11. def setSignatures(value: Map[String, String]): BertForSequenceClassification.this.type

  12. def setVocabulary(value: Map[String, Int]): BertForSequenceClassification.this.type

Parameter getters

  1. def getActivation: String

  2. def getBatchSize: Int

    Size of every batch.

    Size of every batch.

    Definition Classes
    HasBatchedAnnotate
  3. def getCaseSensitive: Boolean

    Definition Classes
    HasCaseSensitiveProperties
  4. def getCoalesceSentences: Boolean

  5. def getConfigProtoBytes: Option[Array[Byte]]

  6. def getEngine: String

    Definition Classes
    HasEngine
  7. def getMaxSentenceLength: Int

  8. def getModelIfNotSet: BertClassification

  9. def getSignatures: Option[Map[String, String]]