`sparknlp.annotator.dependency.dependency_parser`#

Contains classes for the DependencyParser.

Module Contents#

Classes#

`DependencyParserApproach`	Trains an unlabeled parser that finds a grammatical relations between two
`DependencyParserModel`	Unlabeled parser that finds a grammatical relation between two words in a

class DependencyParserApproach[source]#

Trains an unlabeled parser that finds a grammatical relations between two words in a sentence.

For instantiated/pretrained models, see DependencyParserModel.

Dependency parser provides information about word relationship. For example, dependency parsing can tell you what the subjects and objects of a verb are, as well as which words are modifying (describing) the subject. This can help you find precise answers to specific questions.

The required training data can be set in two different ways (only one can be chosen for a particular model):

Dependency treebank in the Penn Treebank format set with setDependencyTreeBank
Dataset in the CoNLL-U format set with setConllU

Apart from that, no additional training data is needed.

Input Annotation types	Output Annotation type
`DOCUMENT, POS, TOKEN`	`DEPENDENCY`

Parameters:

dependencyTreeBank: Dependency treebank source files
conllU: Universal Dependencies source files
numberOfIterations: Number of iterations in training, converges to better accuracy, by default 10

See also

TypedDependencyParserApproach: to extract labels for the dependencies

Examples

>>> import sparknlp
>>> from sparknlp.base import *
>>> from sparknlp.annotator import *
>>> from pyspark.ml import Pipeline
>>> documentAssembler = DocumentAssembler() \
...     .setInputCol("text") \
...     .setOutputCol("document")
>>> sentence = SentenceDetector() \
...     .setInputCols(["document"]) \
...     .setOutputCol("sentence")
>>> tokenizer = Tokenizer() \
...     .setInputCols(["sentence"]) \
...     .setOutputCol("token")
>>> posTagger = PerceptronModel.pretrained() \
...     .setInputCols(["sentence", "token"]) \
...     .setOutputCol("pos")
>>> dependencyParserApproach = DependencyParserApproach() \
...     .setInputCols(["sentence", "pos", "token"]) \
...     .setOutputCol("dependency") \
...     .setDependencyTreeBank("src/test/resources/parser/unlabeled/dependency_treebank")
>>> pipeline = Pipeline().setStages([
...     documentAssembler,
...     sentence,
...     tokenizer,
...     posTagger,
...     dependencyParserApproach
... ])
>>> emptyDataSet = spark.createDataFrame([[""]]).toDF("text")
>>> pipelineModel = pipeline.fit(emptyDataSet)

Additional training data is not needed, the dependency parser relies on the dependency tree bank / CoNLL-U only.

inputAnnotatorTypes[source]#

outputAnnotatorType = 'dependency'[source]#

dependencyTreeBank[source]#

conllU[source]#

numberOfIterations[source]#

setNumberOfIterations(value)[source]#

Sets number of iterations in training, converges to better accuracy, by default 10.

Parameters:

valueint: Number of iterations

setDependencyTreeBank(path, read_as=ReadAs.TEXT, options={'key': 'value'})[source]#

Sets dependency treebank source files.

Parameters:

pathstr: Path to the source files
read_asstr, optional: How to read the file, by default ReadAs.TEXT
optionsdict, optional: Options to read the resource, by default {“key”: “value”}

setConllU(path, read_as=ReadAs.TEXT, options={'key': 'value'})[source]#

Sets Universal Dependencies source files.

Parameters:

pathstr: Path to the source files
read_asstr, optional: How to read the file, by default ReadAs.TEXT
optionsdict, optional: Options to read the resource, by default {“key”: “value”}

class DependencyParserModel(classname='com.johnsnowlabs.nlp.annotators.parser.dep.DependencyParserModel', java_model=None)[source]#

Unlabeled parser that finds a grammatical relation between two words in a sentence.

This is the instantiated model of the DependencyParserApproach. For training your own model, please see the documentation of that class.

Pretrained models can be loaded with pretrained() of the companion object:

>>> dependencyParserApproach = DependencyParserModel.pretrained() \
...     .setInputCols(["sentence", "pos", "token"]) \
...     .setOutputCol("dependency")

The default model is "dependency_conllu", if no name is provided. For available pretrained models please see the Models Hub.

For extended examples of usage, see the Examples.

Input Annotation types	Output Annotation type
`[String]DOCUMENT, POS, TOKEN`	`DEPENDENCY`

Parameters:

perceptron: Dependency parsing perceptron features

See also

TypedDependencyParserMdoel: to extract labels for the dependencies

Examples

>>> import sparknlp
>>> from sparknlp.base import *
>>> from sparknlp.annotator import *
>>> from pyspark.ml import Pipeline
>>> documentAssembler = DocumentAssembler() \
...     .setInputCol("text") \
...     .setOutputCol("document")
>>> sentence = SentenceDetector() \
...     .setInputCols(["document"]) \
...     .setOutputCol("sentence")
>>> tokenizer = Tokenizer() \
...     .setInputCols(["sentence"]) \
...     .setOutputCol("token")
>>> posTagger = PerceptronModel.pretrained() \
...     .setInputCols(["sentence", "token"]) \
...     .setOutputCol("pos")
>>> dependencyParser = DependencyParserModel.pretrained() \
...     .setInputCols(["sentence", "pos", "token"]) \
...     .setOutputCol("dependency")
>>> pipeline = Pipeline().setStages([
...     documentAssembler,
...     sentence,
...     tokenizer,
...     posTagger,
...     dependencyParser
... ])
>>> data = spark.createDataFrame([[
...     "Unions representing workers at Turner Newall say they are 'disappointed' after talks with stricken parent " +
...     "firm Federal Mogul."
... ]]).toDF("text")
>>> result = pipeline.fit(data).transform(data)
>>> result.selectExpr("explode(arrays_zip(token.result, dependency.result)) as cols") \
...     .selectExpr("cols['0'] as token", "cols['1'] as dependency").show(8, truncate = False)
+------------+------------+
|token       |dependency  |
+------------+------------+
|Unions      |ROOT        |
|representing|workers     |
|workers     |Unions      |
|at          |Turner      |
|Turner      |workers     |
|Newall      |say         |
|say         |Unions      |
|they        |disappointed|
+------------+------------+

name = 'DependencyParserModel'[source]#

inputAnnotatorTypes[source]#

outputAnnotatorType = 'dependency'[source]#

perceptron[source]#

static pretrained(name='dependency_conllu', lang='en', remote_loc=None)[source]#

Downloads and loads a pretrained model.

Parameters:

namestr, optional: Name of the pretrained model, by default “dependency_conllu”
langstr, optional: Language of the pretrained model, by default “en”
remote_locstr, optional: Optional remote address of the resource, by default None. Will use Spark NLPs repositories otherwise.

Returns:

DependencyParserModel: The restored model

sparknlp.annotator.dependency.dependency_parser#

Module Contents#

Classes#

`sparknlp.annotator.dependency.dependency_parser`#