package perceptron

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. case class AveragedPerceptron(tags: Array[String], taggedWordBook: Map[String, String], featuresWeight: Map[String, Map[String, Double]]) extends Serializable with Product

    tags

    Holds all unique tags based on training

    taggedWordBook

    Contains non ambiguous words and their tags

    featuresWeight

    Contains prediction information based on context frequencies

  2. class PerceptronApproach extends AnnotatorApproach[PerceptronModel] with PerceptronTrainingUtils

    Trains an averaged Perceptron model to tag words part-of-speech.

    Trains an averaged Perceptron model to tag words part-of-speech. Sets a POS tag to each word within a sentence.

    For pretrained models please see the PerceptronModel.

    The training data needs to be in a Spark DataFrame, where the column needs to consist of Annotations of type POS. The Annotation needs to have member result set to the POS tag and have a "word" mapping to its word inside of member metadata. This DataFrame for training can easily created by the helper class POS.

    POS().readDataset(spark, datasetPath).selectExpr("explode(tags) as tags").show(false)
    +---------------------------------------------+
    |tags                                         |
    +---------------------------------------------+
    |[pos, 0, 5, NNP, [word -> Pierre], []]       |
    |[pos, 7, 12, NNP, [word -> Vinken], []]      |
    |[pos, 14, 14, ,, [word -> ,], []]            |
    |[pos, 31, 34, MD, [word -> will], []]        |
    |[pos, 36, 39, VB, [word -> join], []]        |
    |[pos, 41, 43, DT, [word -> the], []]         |
    |[pos, 45, 49, NN, [word -> board], []]       |
                          ...

    For extended examples of usage, see the Examples and PerceptronApproach tests.

    Example

    import spark.implicits._
    import com.johnsnowlabs.nlp.base.DocumentAssembler
    import com.johnsnowlabs.nlp.annotator.SentenceDetector
    import com.johnsnowlabs.nlp.annotators.Tokenizer
    import com.johnsnowlabs.nlp.training.POS
    import com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronApproach
    import org.apache.spark.ml.Pipeline
    
    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val sentence = new SentenceDetector()
      .setInputCols("document")
      .setOutputCol("sentence")
    
    val tokenizer = new Tokenizer()
      .setInputCols("sentence")
      .setOutputCol("token")
    
    val datasetPath = "src/test/resources/anc-pos-corpus-small/test-training.txt"
    val trainingPerceptronDF = POS().readDataset(spark, datasetPath)
    
    val trainedPos = new PerceptronApproach()
      .setInputCols("document", "token")
      .setOutputCol("pos")
      .setPosColumn("tags")
      .fit(trainingPerceptronDF)
    
    val pipeline = new Pipeline().setStages(Array(
      documentAssembler,
      sentence,
      tokenizer,
      trainedPos
    ))
    
    val data = Seq("To be or not to be, is this the question?").toDF("text")
    val result = pipeline.fit(data).transform(data)
    
    result.selectExpr("pos.result").show(false)
    +--------------------------------------------------+
    |result                                            |
    +--------------------------------------------------+
    |[NNP, NNP, CD, JJ, NNP, NNP, ,, MD, VB, DT, CD, .]|
    +--------------------------------------------------+
  3. class PerceptronApproachDistributed extends AnnotatorApproach[PerceptronModel] with PerceptronTrainingUtils

    Distributed Averaged Perceptron model to tag words part-of-speech.

    Distributed Averaged Perceptron model to tag words part-of-speech.

    Sets a POS tag to each word within a sentence. Its train data (train_pos) is a spark dataset of POS format values with Annotation columns.

    See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/pos/perceptron/DistributedPos.scala for further reference on how to use this APIs.

  4. class PerceptronModel extends AnnotatorModel[PerceptronModel] with HasSimpleAnnotate[PerceptronModel] with PerceptronPredictionUtils

    Averaged Perceptron model to tag words part-of-speech.

    Averaged Perceptron model to tag words part-of-speech. Sets a POS tag to each word within a sentence.

    This is the instantiated model of the PerceptronApproach. For training your own model, please see the documentation of that class.

    Pretrained models can be loaded with pretrained of the companion object:

    val posTagger = PerceptronModel.pretrained()
      .setInputCols("document", "token")
      .setOutputCol("pos")

    The default model is "pos_anc", if no name is provided.

    For available pretrained models please see the Models Hub. Additionally, pretrained pipelines are available for this module, see Pipelines.

    For extended examples of usage, see the Examples.

    Example

    import spark.implicits._
    import com.johnsnowlabs.nlp.base.DocumentAssembler
    import com.johnsnowlabs.nlp.annotators.Tokenizer
    import com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronModel
    import org.apache.spark.ml.Pipeline
    
    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val tokenizer = new Tokenizer()
      .setInputCols("document")
      .setOutputCol("token")
    
    val posTagger = PerceptronModel.pretrained()
      .setInputCols("document", "token")
      .setOutputCol("pos")
    
    val pipeline = new Pipeline().setStages(Array(
      documentAssembler,
      tokenizer,
      posTagger
    ))
    
    val data = Seq("Peter Pipers employees are picking pecks of pickled peppers").toDF("text")
    val result = pipeline.fit(data).transform(data)
    
    result.selectExpr("explode(pos) as pos").show(false)
    +-------------------------------------------+
    |pos                                        |
    +-------------------------------------------+
    |[pos, 0, 4, NNP, [word -> Peter], []]      |
    |[pos, 6, 11, NNP, [word -> Pipers], []]    |
    |[pos, 13, 21, NNS, [word -> employees], []]|
    |[pos, 23, 25, VBP, [word -> are], []]      |
    |[pos, 27, 33, VBG, [word -> picking], []]  |
    |[pos, 35, 39, NNS, [word -> pecks], []]    |
    |[pos, 41, 42, IN, [word -> of], []]        |
    |[pos, 44, 50, JJ, [word -> pickled], []]   |
    |[pos, 52, 58, NNS, [word -> peppers], []]  |
    +-------------------------------------------+
  5. trait PerceptronPredictionUtils extends PerceptronUtils
  6. trait PerceptronTrainingUtils extends PerceptronUtils
  7. trait PerceptronUtils extends AnyRef
  8. trait ReadablePretrainedPerceptron extends ParamsAndFeaturesReadable[PerceptronModel] with HasPretrained[PerceptronModel]
  9. class StringMapStringDoubleAccumulator extends AccumulatorV2[(String, Map[String, Double]), Map[String, Map[String, Double]]]
  10. class TrainingPerceptronLegacy extends Serializable
  11. class TupleKeyLongDoubleMapAccumulator extends AccumulatorV2[((String, String), (Long, Double)), Map[(String, String), (Long, Double)]]

Value Members

  1. object PerceptronApproach extends DefaultParamsReadable[PerceptronApproach] with Serializable

    This is the companion object of PerceptronApproach.

    This is the companion object of PerceptronApproach. Please refer to that class for the documentation.

  2. object PerceptronApproachDistributed extends DefaultParamsReadable[PerceptronApproachDistributed] with Serializable

    This is the companion object of PerceptronApproachDistributed.

    This is the companion object of PerceptronApproachDistributed. Please refer to that class for the documentation.

  3. object PerceptronModel extends ReadablePretrainedPerceptron with Serializable

    This is the companion object of PerceptronModel.

    This is the companion object of PerceptronModel. Please refer to that class for the documentation.

Ungrouped