package perceptron
- Alphabetic
- Public
- All
Type Members
-
case class
AveragedPerceptron(tags: Array[String], taggedWordBook: Map[String, String], featuresWeight: Map[String, Map[String, Double]]) extends Serializable with Product
- tags
Holds all unique tags based on training
- taggedWordBook
Contains non ambiguous words and their tags
- featuresWeight
Contains prediction information based on context frequencies
-
class
PerceptronApproach extends AnnotatorApproach[PerceptronModel] with PerceptronTrainingUtils
Trains an averaged Perceptron model to tag words part-of-speech.
Trains an averaged Perceptron model to tag words part-of-speech. Sets a POS tag to each word within a sentence.
For pretrained models please see the PerceptronModel.
The training data needs to be in a Spark DataFrame, where the column needs to consist of Annotations of type
POS
. TheAnnotation
needs to have memberresult
set to the POS tag and have a"word"
mapping to its word inside of membermetadata
. This DataFrame for training can easily created by the helper class POS.POS().readDataset(spark, datasetPath).selectExpr("explode(tags) as tags").show(false) +---------------------------------------------+ |tags | +---------------------------------------------+ |[pos, 0, 5, NNP, [word -> Pierre], []] | |[pos, 7, 12, NNP, [word -> Vinken], []] | |[pos, 14, 14, ,, [word -> ,], []] | |[pos, 31, 34, MD, [word -> will], []] | |[pos, 36, 39, VB, [word -> join], []] | |[pos, 41, 43, DT, [word -> the], []] | |[pos, 45, 49, NN, [word -> board], []] | ...
For extended examples of usage, see the Examples and PerceptronApproach tests.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base.DocumentAssembler import com.johnsnowlabs.nlp.annotator.SentenceDetector import com.johnsnowlabs.nlp.annotators.Tokenizer import com.johnsnowlabs.nlp.training.POS import com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronApproach import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val sentence = new SentenceDetector() .setInputCols("document") .setOutputCol("sentence") val tokenizer = new Tokenizer() .setInputCols("sentence") .setOutputCol("token") val datasetPath = "src/test/resources/anc-pos-corpus-small/test-training.txt" val trainingPerceptronDF = POS().readDataset(spark, datasetPath) val trainedPos = new PerceptronApproach() .setInputCols("document", "token") .setOutputCol("pos") .setPosColumn("tags") .fit(trainingPerceptronDF) val pipeline = new Pipeline().setStages(Array( documentAssembler, sentence, tokenizer, trainedPos )) val data = Seq("To be or not to be, is this the question?").toDF("text") val result = pipeline.fit(data).transform(data) result.selectExpr("pos.result").show(false) +--------------------------------------------------+ |result | +--------------------------------------------------+ |[NNP, NNP, CD, JJ, NNP, NNP, ,, MD, VB, DT, CD, .]| +--------------------------------------------------+
-
class
PerceptronApproachDistributed extends AnnotatorApproach[PerceptronModel] with PerceptronTrainingUtils
Distributed Averaged Perceptron model to tag words part-of-speech.
Distributed Averaged Perceptron model to tag words part-of-speech.
Sets a POS tag to each word within a sentence. Its train data (train_pos) is a spark dataset of POS format values with Annotation columns.
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/pos/perceptron/DistributedPos.scala for further reference on how to use this APIs.
-
class
PerceptronModel extends AnnotatorModel[PerceptronModel] with HasSimpleAnnotate[PerceptronModel] with PerceptronPredictionUtils
Averaged Perceptron model to tag words part-of-speech.
Averaged Perceptron model to tag words part-of-speech. Sets a POS tag to each word within a sentence.
This is the instantiated model of the PerceptronApproach. For training your own model, please see the documentation of that class.
Pretrained models can be loaded with
pretrained
of the companion object:val posTagger = PerceptronModel.pretrained() .setInputCols("document", "token") .setOutputCol("pos")
The default model is
"pos_anc"
, if no name is provided.For available pretrained models please see the Models Hub. Additionally, pretrained pipelines are available for this module, see Pipelines.
For extended examples of usage, see the Examples.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base.DocumentAssembler import com.johnsnowlabs.nlp.annotators.Tokenizer import com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronModel import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val posTagger = PerceptronModel.pretrained() .setInputCols("document", "token") .setOutputCol("pos") val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, posTagger )) val data = Seq("Peter Pipers employees are picking pecks of pickled peppers").toDF("text") val result = pipeline.fit(data).transform(data) result.selectExpr("explode(pos) as pos").show(false) +-------------------------------------------+ |pos | +-------------------------------------------+ |[pos, 0, 4, NNP, [word -> Peter], []] | |[pos, 6, 11, NNP, [word -> Pipers], []] | |[pos, 13, 21, NNS, [word -> employees], []]| |[pos, 23, 25, VBP, [word -> are], []] | |[pos, 27, 33, VBG, [word -> picking], []] | |[pos, 35, 39, NNS, [word -> pecks], []] | |[pos, 41, 42, IN, [word -> of], []] | |[pos, 44, 50, JJ, [word -> pickled], []] | |[pos, 52, 58, NNS, [word -> peppers], []] | +-------------------------------------------+
- trait PerceptronPredictionUtils extends PerceptronUtils
- trait PerceptronTrainingUtils extends PerceptronUtils
- trait PerceptronUtils extends AnyRef
- trait ReadablePretrainedPerceptron extends ParamsAndFeaturesReadable[PerceptronModel] with HasPretrained[PerceptronModel]
- class StringMapStringDoubleAccumulator extends AccumulatorV2[(String, Map[String, Double]), Map[String, Map[String, Double]]]
- class TrainingPerceptronLegacy extends Serializable
- class TupleKeyLongDoubleMapAccumulator extends AccumulatorV2[((String, String), (Long, Double)), Map[(String, String), (Long, Double)]]
Value Members
-
object
PerceptronApproach extends DefaultParamsReadable[PerceptronApproach] with Serializable
This is the companion object of PerceptronApproach.
This is the companion object of PerceptronApproach. Please refer to that class for the documentation.
-
object
PerceptronApproachDistributed extends DefaultParamsReadable[PerceptronApproachDistributed] with Serializable
This is the companion object of PerceptronApproachDistributed.
This is the companion object of PerceptronApproachDistributed. Please refer to that class for the documentation.
-
object
PerceptronModel extends ReadablePretrainedPerceptron with Serializable
This is the companion object of PerceptronModel.
This is the companion object of PerceptronModel. Please refer to that class for the documentation.