class PerceptronModel extends AnnotatorModel[PerceptronModel] with HasSimpleAnnotate[PerceptronModel] with PerceptronPredictionUtils

Averaged Perceptron model to tag words part-of-speech. Sets a POS tag to each word within a sentence.

This is the instantiated model of the PerceptronApproach. For training your own model, please see the documentation of that class.

Pretrained models can be loaded with pretrained of the companion object:

val posTagger = PerceptronModel.pretrained()
  .setInputCols("document", "token")
  .setOutputCol("pos")

The default model is "pos_anc", if no name is provided.

For available pretrained models please see the Models Hub. Additionally, pretrained pipelines are available for this module, see Pipelines.

For extended examples of usage, see the Examples.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotators.Tokenizer
import com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronModel
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val posTagger = PerceptronModel.pretrained()
  .setInputCols("document", "token")
  .setOutputCol("pos")

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  posTagger
))

val data = Seq("Peter Pipers employees are picking pecks of pickled peppers").toDF("text")
val result = pipeline.fit(data).transform(data)

result.selectExpr("explode(pos) as pos").show(false)
+-------------------------------------------+
|pos                                        |
+-------------------------------------------+
|[pos, 0, 4, NNP, [word -> Peter], []]      |
|[pos, 6, 11, NNP, [word -> Pipers], []]    |
|[pos, 13, 21, NNS, [word -> employees], []]|
|[pos, 23, 25, VBP, [word -> are], []]      |
|[pos, 27, 33, VBG, [word -> picking], []]  |
|[pos, 35, 39, NNS, [word -> pecks], []]    |
|[pos, 41, 42, IN, [word -> of], []]        |
|[pos, 44, 50, JJ, [word -> pickled], []]   |
|[pos, 52, 58, NNS, [word -> peppers], []]  |
+-------------------------------------------+
Linear Supertypes
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. PerceptronModel
  2. PerceptronPredictionUtils
  3. PerceptronUtils
  4. HasSimpleAnnotate
  5. AnnotatorModel
  6. CanBeLazy
  7. RawAnnotator
  8. HasOutputAnnotationCol
  9. HasInputAnnotationCols
  10. HasOutputAnnotatorType
  11. ParamsAndFeaturesWritable
  12. HasFeatures
  13. DefaultParamsWritable
  14. MLWritable
  15. Model
  16. Transformer
  17. PipelineStage
  18. Logging
  19. Params
  20. Serializable
  21. Serializable
  22. Identifiable
  23. AnyRef
  24. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Parameters

A list of (hyper-)parameter keys this annotator can take. Users can set and get the parameter values through setters and getters, respectively.

  1. val model: StructFeature[AveragedPerceptron]

    POS model

Annotator types

Required input and expected output annotator types

  1. val inputAnnotatorTypes: Array[AnnotatorType]

    Input annotator types : TOKEN, DOCUMENT

    Input annotator types : TOKEN, DOCUMENT

    Definition Classes
    PerceptronModelHasInputAnnotationCols
  2. val outputAnnotatorType: AnnotatorType

    Output annotator types : POS

    Output annotator types : POS

    Definition Classes
    PerceptronModelHasOutputAnnotatorType

Members

  1. type AnnotatorType = String
    Definition Classes
    HasOutputAnnotatorType
  1. def annotate(annotations: Seq[Annotation]): Seq[Annotation]

    One to one annotation standing from the Tokens perspective, to give each word a corresponding Tag

    One to one annotation standing from the Tokens perspective, to give each word a corresponding Tag

    annotations

    Annotations that correspond to inputAnnotationCols generated by previous annotators if any

    returns

    any number of annotations processed for every input annotation. Not necessary one to one relationship

    Definition Classes
    PerceptronModelHasSimpleAnnotate
  2. final def clear(param: Param[_]): PerceptronModel.this.type
    Definition Classes
    Params
  3. def copy(extra: ParamMap): PerceptronModel

    requirement for annotators copies

    requirement for annotators copies

    Definition Classes
    RawAnnotator → Model → Transformer → PipelineStage → Params
  4. def dfAnnotate: UserDefinedFunction

    Wraps annotate to happen inside SparkSQL user defined functions in order to act with org.apache.spark.sql.Column

    Wraps annotate to happen inside SparkSQL user defined functions in order to act with org.apache.spark.sql.Column

    returns

    udf function to be applied to inputCols using this annotator's annotate function as part of ML transformation

    Definition Classes
    HasSimpleAnnotate
  5. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  6. def explainParams(): String
    Definition Classes
    Params
  7. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  8. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  9. val features: ArrayBuffer[Feature[_, _, _]]
    Definition Classes
    HasFeatures
  10. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  11. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  12. def getInputCols: Array[String]

    returns

    input annotations columns currently used

    Definition Classes
    HasInputAnnotationCols
  13. def getLazyAnnotator: Boolean
    Definition Classes
    CanBeLazy
  14. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  15. final def getOutputCol: String

    Gets annotation column name going to generate

    Gets annotation column name going to generate

    Definition Classes
    HasOutputAnnotationCol
  16. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  17. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  18. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  19. def hasParent: Boolean
    Definition Classes
    Model
  20. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  21. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  22. val lazyAnnotator: BooleanParam
    Definition Classes
    CanBeLazy
  23. val optionalInputAnnotatorTypes: Array[String]
    Definition Classes
    HasInputAnnotationCols
  24. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  25. var parent: Estimator[PerceptronModel]
    Definition Classes
    Model
  26. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  27. final def set[T](param: Param[T], value: T): PerceptronModel.this.type
    Definition Classes
    Params
  28. final def setInputCols(value: String*): PerceptronModel.this.type
    Definition Classes
    HasInputAnnotationCols
  29. def setInputCols(value: Array[String]): PerceptronModel.this.type

    Overrides required annotators column if different than default

    Overrides required annotators column if different than default

    Definition Classes
    HasInputAnnotationCols
  30. def setLazyAnnotator(value: Boolean): PerceptronModel.this.type
    Definition Classes
    CanBeLazy
  31. final def setOutputCol(value: String): PerceptronModel.this.type

    Overrides annotation column name when transforming

    Overrides annotation column name when transforming

    Definition Classes
    HasOutputAnnotationCol
  32. def setParent(parent: Estimator[PerceptronModel]): PerceptronModel
    Definition Classes
    Model
  33. def tag(model: AveragedPerceptron, tokenizedSentences: Array[TokenizedSentence]): Array[TaggedSentence]

    Tags a group of sentences into POS tagged sentences The logic here is to create a sentence context, run through every word and evaluate its context Based on how frequent a context appears around a word, such context is given a score which is used to predict Some words are marked as non ambiguous from the beginning

    Tags a group of sentences into POS tagged sentences The logic here is to create a sentence context, run through every word and evaluate its context Based on how frequent a context appears around a word, such context is given a score which is used to predict Some words are marked as non ambiguous from the beginning

    tokenizedSentences

    Sentence in the form of single word tokens

    returns

    A list of sentences which have every word tagged

    Definition Classes
    PerceptronPredictionUtils
  34. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  35. final def transform(dataset: Dataset[_]): DataFrame

    Given requirements are met, this applies ML transformation within a Pipeline or stand-alone Output annotation will be generated as a new column, previous annotations are still available separately metadata is built at schema level to record annotations structural information outside its content

    Given requirements are met, this applies ML transformation within a Pipeline or stand-alone Output annotation will be generated as a new column, previous annotations are still available separately metadata is built at schema level to record annotations structural information outside its content

    dataset

    Dataset[Row]

    Definition Classes
    AnnotatorModel → Transformer
  36. def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" )
  37. def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" ) @varargs()
  38. final def transformSchema(schema: StructType): StructType

    requirement for pipeline transformation validation.

    requirement for pipeline transformation validation. It is called on fit()

    Definition Classes
    RawAnnotator → PipelineStage
  39. val uid: String
    Definition Classes
    PerceptronModel → Identifiable
  40. def write: MLWriter
    Definition Classes
    ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable

Parameter setters

  1. def setModel(targetModel: AveragedPerceptron): PerceptronModel.this.type

Parameter getters