Untyped Dependency Parsing for English

Description

Untyped Dependency parser, trained on the on the CONLL dataset.

Dependency parsing is the task of extracting a dependency parse of a sentence that represents its grammatical structure and defines the relationships between “head” words and words, which modify those heads.

Example:

root | | +——-dobj———+ | | | nsubj | | +——det—–+ | +—–nmod——+ +–+ | | | | | | | | | | | | +-nmod-+| | | +-case-+ |

  • + + +   + +  

I prefer the morning flight through Denver Relations among the words are illustrated above the sentence with directed, labeled arcs from heads to dependents (+ indicates the dependent).

Live Demo Open in Colab Download Copy S3 URI

How to use

from sparknlp.annotators import *

documentAssembler     = DocumentAssembler().setInputCol("text").setOutputCol("document")
sentenceDetector      = SentenceDetector().setInputCols(["document"]).setOutputCol("sentence")
tokenizer             = Tokenizer().setInputCols(["sentence"]).setOutputCol("token")
posTagger             = PerceptronModel.pretrained().setInputCols(["token", "sentence"]).setOutputCol("pos")
dependencyParser      = DependencyParserModel.pretrained().setInputCols(["sentence", "pos", "token"]).setOutputCol("dependency")
typedDependencyParser = TypedDependencyParserModel.pretrained().setInputCols(["token", "pos", "dependency"]).setOutputCol("labdep")
pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, posTagger, dependencyParser, typedDependencyParser])
data = spark.createDataFrame({"text": "Dependencies represents relationships betweens words in a Sentence"})
# Create data frame
df = spark.createDataFrame(data)
result = pipeline.fit(df).transform(df)
result.select("dependency.result").show(false)



import com.johnsnowlabs.nlp.DocumentAssembler
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline
import spark.implicits._

val documentAssembler     = new DocumentAssembler().setInputCol("text").setOutputCol("document")
val sentenceDetector      = new SentenceDetector().setInputCols(Array("document")).setOutputCol("sentence")
val tokenizer             = new Tokenizer().setInputCols(Array("sentence")).setOutputCol("token")
val posTagger             = PerceptronModel.pretrained().setInputCols(Array("token", "sentence")).setOutputCol("pos")
val dependencyParser      = DependencyParserModel.pretrained().setInputCols(Array("sentence", "pos", "token")).setOutputCol("dependency")
val typedDependencyParser = TypedDependencyParserModel.pretrained().setInputCols(Array("token", "pos", "dependency")).setOutputCol("labdep")
val pipeline              = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer, posTagger, dependencyParser, typedDependencyParser))
val df = Seq("Dependencies represents relationships betweens words in a Sentence").toDF("text")
val result = pipeline.fit(df).transform(df)
result.select("dependency.result").show(false)

nlu.load("dep.untyped").predict("Dependencies represents relationships betweens words in a Sentence")

Results

+---------------------------------------------------------------------------------+
|result                                                                           |
+---------------------------------------------------------------------------------+
|[ROOT, Dependencies, represents, words, relationships, Sentence, Sentence, words]|
+---------------------------------------------------------------------------------+

Model Information

Model Name: dependency_conllu
Compatibility: Spark NLP 3.4.4+
License: Open Source
Edition: Official
Input Labels: [sentence, pos, token]
Output Labels: [dep_root]
Language: en
Size: 17.5 MB

Data Source

CONLL