Description
Untyped Dependency parser, trained on the on the CONLL dataset.
Dependency parsing is the task of extracting a dependency parse of a sentence that represents its grammatical structure and defines the relationships between “head” words and words, which modify those heads.
Example:
root | | +——-dobj———+ | | | nsubj | | +——det—–+ | +—–nmod——+ +–+ | | | | | | | | | | | | +-nmod-+| | | +-case-+ |
-
+ + + + +
I prefer the morning flight through Denver Relations among the words are illustrated above the sentence with directed, labeled arcs from heads to dependents (+ indicates the dependent).
Live Demo Open in Colab Download Copy S3 URI
How to use
from sparknlp.annotators import *
documentAssembler = DocumentAssembler().setInputCol("text").setOutputCol("document")
sentenceDetector = SentenceDetector().setInputCols(["document"]).setOutputCol("sentence")
tokenizer = Tokenizer().setInputCols(["sentence"]).setOutputCol("token")
posTagger = PerceptronModel.pretrained().setInputCols(["token", "sentence"]).setOutputCol("pos")
dependencyParser = DependencyParserModel.pretrained().setInputCols(["sentence", "pos", "token"]).setOutputCol("dependency")
typedDependencyParser = TypedDependencyParserModel.pretrained().setInputCols(["token", "pos", "dependency"]).setOutputCol("labdep")
pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, posTagger, dependencyParser, typedDependencyParser])
data = spark.createDataFrame({"text": "Dependencies represents relationships betweens words in a Sentence"})
# Create data frame
df = spark.createDataFrame(data)
result = pipeline.fit(df).transform(df)
result.select("dependency.result").show(false)
import com.johnsnowlabs.nlp.DocumentAssembler
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline
import spark.implicits._
val documentAssembler = new DocumentAssembler().setInputCol("text").setOutputCol("document")
val sentenceDetector = new SentenceDetector().setInputCols(Array("document")).setOutputCol("sentence")
val tokenizer = new Tokenizer().setInputCols(Array("sentence")).setOutputCol("token")
val posTagger = PerceptronModel.pretrained().setInputCols(Array("token", "sentence")).setOutputCol("pos")
val dependencyParser = DependencyParserModel.pretrained().setInputCols(Array("sentence", "pos", "token")).setOutputCol("dependency")
val typedDependencyParser = TypedDependencyParserModel.pretrained().setInputCols(Array("token", "pos", "dependency")).setOutputCol("labdep")
val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer, posTagger, dependencyParser, typedDependencyParser))
val df = Seq("Dependencies represents relationships betweens words in a Sentence").toDF("text")
val result = pipeline.fit(df).transform(df)
result.select("dependency.result").show(false)
nlu.load("dep.untyped").predict("Dependencies represents relationships betweens words in a Sentence")
Results
+---------------------------------------------------------------------------------+
|result |
+---------------------------------------------------------------------------------+
|[ROOT, Dependencies, represents, words, relationships, Sentence, Sentence, words]|
+---------------------------------------------------------------------------------+
Model Information
Model Name: | dependency_conllu |
Compatibility: | Spark NLP 3.4.4+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [sentence, pos, token] |
Output Labels: | [dep_root] |
Language: | en |
Size: | 17.5 MB |
Data Source
CONLL