Description
A Part of Speech classifier predicts a grammatical label for every token in the input text. Implemented with an averaged perceptron architecture.
Predicted Entities
- ADP
- NOUN
- DET
- AUX
- PRON
- VERB
- SCONJ
- PART
- ADV
- PUNCT
- CCONJ
- ADJ
- PROPN
- NUM
- X
- SYM
Live Demo Open in Colab Download Copy S3 URI
How to use
document_assembler = DocumentAssembler() \
  .setInputCol("text") \
  .setOutputCol("document")
sentence_detector = SentenceDetector() \
  .setInputCols(["document"]) \
  .setOutputCol("sentence")
pos = PerceptronModel.pretrained("pos_ud_idt", "ga") \
  .setInputCols(["document", "token"]) \
  .setOutputCol("pos")
pipeline = Pipeline(stages=[
  document_assembler,
  sentence_detector,
  posTagger
])
example = spark.createDataFrame([['Dia duit ó John Labs Sneachta! ']], ["text"])
result = pipeline.fit(example).transform(example)
val document_assembler = DocumentAssembler()
        .setInputCol("text")
        .setOutputCol("document")
val sentence_detector = SentenceDetector()
        .setInputCols("document")
.setOutputCol("sentence")
val pos = PerceptronModel.pretrained("pos_ud_idt", "ga")
        .setInputCols(Array("document", "token"))
        .setOutputCol("pos")
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, pos))
val data = Seq("Dia duit ó John Labs Sneachta! ").toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
text = [""Dia duit ó John Labs Sneachta! ""]
token_df = nlu.load('ga.pos').predict(text)
token_df
    
Results
      token    pos
                  
0       Dia   NOUN
1      duit   NOUN
2         ó    ADP
3      John  PROPN
4      Labs  PROPN
5  Sneachta   NOUN
6         !  PUNCT
Model Information
| Model Name: | pos_ud_idt | 
| Compatibility: | Spark NLP 3.0.0+ | 
| License: | Open Source | 
| Edition: | Official | 
| Input Labels: | [document, token] | 
| Output Labels: | [pos] | 
| Language: | ga |