Packages

class BertForTokenClassification extends AnnotatorModel[BertForTokenClassification] with HasBatchedAnnotate[BertForTokenClassification] with WriteTensorflowModel with WriteOnnxModel with WriteOpenvinoModel with HasCaseSensitiveProperties with HasEngine

BertForTokenClassification can load Bert Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

Pretrained models can be loaded with pretrained of the companion object:

val tokenClassifier = BertForTokenClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")

The default model is "bert_base_token_classifier_conll03", if no name is provided.

For available pretrained models please see the Models Hub.

To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see BertForTokenClassificationTestSpec.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val tokenClassifier = BertForTokenClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  tokenClassifier
))

val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+------------------------------------------------------------------------------------+
|result                                                                              |
+------------------------------------------------------------------------------------+
|[B-PER, I-PER, O, O, O, B-LOC, O, O, O, B-LOC, O, O, O, O, B-PER, O, O, O, O, B-LOC]|
+------------------------------------------------------------------------------------+
See also

BertForTokenClassification for token-level classification

Annotators Main Page for a list of transformer based classifiers

Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. BertForTokenClassification
  2. HasEngine
  3. HasCaseSensitiveProperties
  4. WriteOpenvinoModel
  5. WriteOnnxModel
  6. WriteTensorflowModel
  7. HasBatchedAnnotate
  8. AnnotatorModel
  9. CanBeLazy
  10. RawAnnotator
  11. HasOutputAnnotationCol
  12. HasInputAnnotationCols
  13. HasOutputAnnotatorType
  14. ParamsAndFeaturesWritable
  15. HasFeatures
  16. DefaultParamsWritable
  17. MLWritable
  18. Model
  19. Transformer
  20. PipelineStage
  21. Logging
  22. Params
  23. Serializable
  24. Serializable
  25. Identifiable
  26. AnyRef
  27. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Parameters

A list of (hyper-)parameter keys this annotator can take. Users can set and get the parameter values through setters and getters, respectively.

  1. val batchSize: IntParam

    Size of every batch (Default depends on model).

    Size of every batch (Default depends on model).

    Definition Classes
    HasBatchedAnnotate
  2. val caseSensitive: BooleanParam

    Whether to ignore case in index lookups (Default depends on model)

    Whether to ignore case in index lookups (Default depends on model)

    Definition Classes
    HasCaseSensitiveProperties
  3. val configProtoBytes: IntArrayParam

    ConfigProto from tensorflow, serialized into byte array.

    ConfigProto from tensorflow, serialized into byte array. Get with config_proto.SerializeToString()

  4. val engine: Param[String]

    This param is set internally once via loadSavedModel.

    This param is set internally once via loadSavedModel. That's why there is no setter

    Definition Classes
    HasEngine
  5. val labels: MapFeature[String, Int]

    Labels used to decode predicted IDs back to string tags

  6. val maxSentenceLength: IntParam

    Max sentence length to process (Default: 128)

  7. val signatures: MapFeature[String, String]

    It contains TF model signatures for the laded saved model

  8. val vocabulary: MapFeature[String, Int]

    Vocabulary used to encode the words to ids with WordPieceEncoder

Annotator types

Required input and expected output annotator types

  1. val inputAnnotatorTypes: Array[String]

    Input Annotator Types: DOCUMENT, TOKEN

    Input Annotator Types: DOCUMENT, TOKEN

    Definition Classes
    BertForTokenClassificationHasInputAnnotationCols
  2. val outputAnnotatorType: AnnotatorType

    Output Annotator Types: WORD_EMBEDDINGS

    Output Annotator Types: WORD_EMBEDDINGS

    Definition Classes
    BertForTokenClassificationHasOutputAnnotatorType

Members

  1. type AnnotatorType = String
    Definition Classes
    HasOutputAnnotatorType
  1. def batchAnnotate(batchedAnnotations: Seq[Array[Annotation]]): Seq[Seq[Annotation]]

    takes a document and annotations and produces new annotations of this annotator's annotation type

    takes a document and annotations and produces new annotations of this annotator's annotation type

    batchedAnnotations

    Annotations that correspond to inputAnnotationCols generated by previous annotators if any

    returns

    any number of annotations processed for every input annotation. Not necessary one to one relationship

    Definition Classes
    BertForTokenClassificationHasBatchedAnnotate
  2. def batchProcess(rows: Iterator[_]): Iterator[Row]
    Definition Classes
    HasBatchedAnnotate
  3. final def clear(param: Param[_]): BertForTokenClassification.this.type
    Definition Classes
    Params
  4. def copy(extra: ParamMap): BertForTokenClassification

    requirement for annotators copies

    requirement for annotators copies

    Definition Classes
    RawAnnotator → Model → Transformer → PipelineStage → Params
  5. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  6. def explainParams(): String
    Definition Classes
    Params
  7. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  8. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  9. val features: ArrayBuffer[Feature[_, _, _]]
    Definition Classes
    HasFeatures
  10. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  11. def getClasses: Array[String]

    Returns labels used to train this model

  12. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  13. def getInputCols: Array[String]

    returns

    input annotations columns currently used

    Definition Classes
    HasInputAnnotationCols
  14. def getLazyAnnotator: Boolean
    Definition Classes
    CanBeLazy
  15. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  16. final def getOutputCol: String

    Gets annotation column name going to generate

    Gets annotation column name going to generate

    Definition Classes
    HasOutputAnnotationCol
  17. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  18. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  19. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  20. def hasParent: Boolean
    Definition Classes
    Model
  21. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  22. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  23. val lazyAnnotator: BooleanParam
    Definition Classes
    CanBeLazy
  24. def onWrite(path: String, spark: SparkSession): Unit
  25. val optionalInputAnnotatorTypes: Array[String]
    Definition Classes
    HasInputAnnotationCols
  26. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  27. var parent: Estimator[BertForTokenClassification]
    Definition Classes
    Model
  28. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  29. final def set[T](param: Param[T], value: T): BertForTokenClassification.this.type
    Definition Classes
    Params
  30. final def setInputCols(value: String*): BertForTokenClassification.this.type
    Definition Classes
    HasInputAnnotationCols
  31. def setInputCols(value: Array[String]): BertForTokenClassification.this.type

    Overrides required annotators column if different than default

    Overrides required annotators column if different than default

    Definition Classes
    HasInputAnnotationCols
  32. def setLazyAnnotator(value: Boolean): BertForTokenClassification.this.type
    Definition Classes
    CanBeLazy
  33. final def setOutputCol(value: String): BertForTokenClassification.this.type

    Overrides annotation column name when transforming

    Overrides annotation column name when transforming

    Definition Classes
    HasOutputAnnotationCol
  34. def setParent(parent: Estimator[BertForTokenClassification]): BertForTokenClassification
    Definition Classes
    Model
  35. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  36. final def transform(dataset: Dataset[_]): DataFrame

    Given requirements are met, this applies ML transformation within a Pipeline or stand-alone Output annotation will be generated as a new column, previous annotations are still available separately metadata is built at schema level to record annotations structural information outside its content

    Given requirements are met, this applies ML transformation within a Pipeline or stand-alone Output annotation will be generated as a new column, previous annotations are still available separately metadata is built at schema level to record annotations structural information outside its content

    dataset

    Dataset[Row]

    Definition Classes
    AnnotatorModel → Transformer
  37. def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" )
  38. def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" ) @varargs()
  39. final def transformSchema(schema: StructType): StructType

    requirement for pipeline transformation validation.

    requirement for pipeline transformation validation. It is called on fit()

    Definition Classes
    RawAnnotator → PipelineStage
  40. val uid: String
    Definition Classes
    BertForTokenClassification → Identifiable
  41. def write: MLWriter
    Definition Classes
    ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable
  42. def writeOnnxModel(path: String, spark: SparkSession, onnxWrapper: OnnxWrapper, suffix: String, fileName: String): Unit
    Definition Classes
    WriteOnnxModel
  43. def writeOnnxModels(path: String, spark: SparkSession, onnxWrappersWithNames: Seq[(OnnxWrapper, String)], suffix: String): Unit
    Definition Classes
    WriteOnnxModel
  44. def writeOpenvinoModel(path: String, spark: SparkSession, openvinoWrapper: OpenvinoWrapper, suffix: String, fileName: String): Unit
    Definition Classes
    WriteOpenvinoModel
  45. def writeOpenvinoModels(path: String, spark: SparkSession, ovWrappersWithNames: Seq[(OpenvinoWrapper, String)], suffix: String): Unit
    Definition Classes
    WriteOpenvinoModel
  46. def writeTensorflowHub(path: String, tfPath: String, spark: SparkSession, suffix: String = "_use"): Unit
    Definition Classes
    WriteTensorflowModel
  47. def writeTensorflowModel(path: String, spark: SparkSession, tensorflow: TensorflowWrapper, suffix: String, filename: String, configProtoBytes: Option[Array[Byte]] = None): Unit
    Definition Classes
    WriteTensorflowModel
  48. def writeTensorflowModelV2(path: String, spark: SparkSession, tensorflow: TensorflowWrapper, suffix: String, filename: String, configProtoBytes: Option[Array[Byte]] = None, savedSignatures: Option[Map[String, String]] = None): Unit
    Definition Classes
    WriteTensorflowModel

Parameter setters

  1. def sentenceEndTokenId: Int

  2. def sentenceStartTokenId: Int

  3. def setBatchSize(size: Int): BertForTokenClassification.this.type

    Size of every batch.

    Size of every batch.

    Definition Classes
    HasBatchedAnnotate
  4. def setCaseSensitive(value: Boolean): BertForTokenClassification.this.type

    Whether to lowercase tokens or not

    Whether to lowercase tokens or not

    Definition Classes
    BertForTokenClassificationHasCaseSensitiveProperties
  5. def setConfigProtoBytes(bytes: Array[Int]): BertForTokenClassification.this.type

  6. def setLabels(value: Map[String, Int]): BertForTokenClassification.this.type

  7. def setMaxSentenceLength(value: Int): BertForTokenClassification.this.type

  8. def setModelIfNotSet(spark: SparkSession, tensorflowWrapper: Option[TensorflowWrapper], onnxWrapper: Option[OnnxWrapper], openvinoWrapper: Option[OpenvinoWrapper]): BertForTokenClassification

  9. def setSignatures(value: Map[String, String]): BertForTokenClassification.this.type

  10. def setVocabulary(value: Map[String, Int]): BertForTokenClassification.this.type

Parameter getters

  1. def getBatchSize: Int

    Size of every batch.

    Size of every batch.

    Definition Classes
    HasBatchedAnnotate
  2. def getCaseSensitive: Boolean

    Definition Classes
    HasCaseSensitiveProperties
  3. def getConfigProtoBytes: Option[Array[Byte]]

  4. def getEngine: String

    Definition Classes
    HasEngine
  5. def getMaxSentenceLength: Int

  6. def getModelIfNotSet: BertClassification

  7. def getSignatures: Option[Map[String, String]]