`sparknlp.annotator.dependency.typed_dependency_parser`#

Contains classes for the TypedDependencyParser.

Module Contents#

Classes#

`TypedDependencyParserApproach`	Labeled parser that finds a grammatical relation between two words in a
`TypedDependencyParserModel`	Labeled parser that finds a grammatical relation between two words in a

class TypedDependencyParserApproach[source]#

Labeled parser that finds a grammatical relation between two words in a sentence. Its input is either a CoNLL2009 or ConllU dataset.

For instantiated/pretrained models, see TypedDependencyParserModel.

Dependency parsers provide information about word relationship. For example, dependency parsing can tell you what the subjects and objects of a verb are, as well as which words are modifying (describing) the subject. This can help you find precise answers to specific questions.

The parser requires the dependant tokens beforehand with e.g. DependencyParser. The required training data can be set in two different ways (only one can be chosen for a particular model):

Dataset in the CoNLL 2009 format set with setConll2009()
Dataset in the CoNLL-U format set with setConllU()

Apart from that, no additional training data is needed.

Input Annotation types	Output Annotation type
`TOKEN, POS, DEPENDENCY`	`LABELED_DEPENDENCY`

Parameters:

conll2009: Path to file with CoNLL 2009 format
conllU: Universal Dependencies source files
numberOfIterations: Number of iterations in training, converges to better accuracy

Examples

>>> import sparknlp
>>> from sparknlp.base import *
>>> from sparknlp.annotator import *
>>> from pyspark.ml import Pipeline
>>> documentAssembler = DocumentAssembler() \
...     .setInputCol("text") \
...     .setOutputCol("document")
>>> sentence = SentenceDetector() \
...     .setInputCols(["document"]) \
...     .setOutputCol("sentence")
>>> tokenizer = Tokenizer() \
...     .setInputCols(["sentence"]) \
...     .setOutputCol("token")
>>> posTagger = PerceptronModel.pretrained() \
...     .setInputCols(["sentence", "token"]) \
...     .setOutputCol("pos")
>>> dependencyParser = DependencyParserModel.pretrained() \
...     .setInputCols(["sentence", "pos", "token"]) \
...     .setOutputCol("dependency")
>>> typedDependencyParser = TypedDependencyParserApproach() \
...     .setInputCols(["dependency", "pos", "token"]) \
...     .setOutputCol("dependency_type") \
...     .setConllU("src/test/resources/parser/labeled/train_small.conllu.txt") \
...     .setNumberOfIterations(1)
>>> pipeline = Pipeline().setStages([
...     documentAssembler,
...     sentence,
...     tokenizer,
...     posTagger,
...     dependencyParser,
...     typedDependencyParser
... ])

Additional training data is not needed, the dependency parser relies on CoNLL-U only.

>>> emptyDataSet = spark.createDataFrame([[""]]).toDF("text")
>>> pipelineModel = pipeline.fit(emptyDataSet)

inputAnnotatorTypes[source]#

outputAnnotatorType = 'labeled_dependency'[source]#

conll2009[source]#

conllU[source]#

numberOfIterations[source]#

setConll2009(path, read_as=ReadAs.TEXT, options={'key': 'value'})[source]#

Sets path to file with CoNLL 2009 format.

Parameters:

pathstr: Path to the resource
read_asstr, optional: How to read the resource, by default ReadAs.TEXT
optionsdict, optional: Options for reading the resource, by default {“key”: “value”}

setConllU(path, read_as=ReadAs.TEXT, options={'key': 'value'})[source]#

Sets path to Universal Dependencies source files.

Parameters:

pathstr: Path to the resource
read_asstr, optional: How to read the resource, by default ReadAs.TEXT
optionsdict, optional: Options for reading the resource, by default {“key”: “value”}

setNumberOfIterations(value)[source]#

Sets Number of iterations in training, converges to better accuracy.

Parameters:

valueint: Number of iterations in training

Returns:

[type]: [description]

class TypedDependencyParserModel(classname='com.johnsnowlabs.nlp.annotators.parser.typdep.TypedDependencyParserModel', java_model=None)[source]#

Labeled parser that finds a grammatical relation between two words in a sentence. Its input is either a CoNLL2009 or ConllU dataset.

The parser requires the dependant tokens beforehand with e.g. DependencyParser.

Pretrained models can be loaded with pretrained() of the companion object:

>>> typedDependencyParser = TypedDependencyParserModel.pretrained() \
...     .setInputCols(["dependency", "pos", "token"]) \
...     .setOutputCol("dependency_type")

The default model is "dependency_typed_conllu", if no name is provided. For available pretrained models please see the Models Hub.

For extended examples of usage, see the Examples.

Input Annotation types	Output Annotation type
`TOKEN, POS, DEPENDENCY`	`LABELED_DEPENDENCY`

Parameters:

None

Examples

>>> import sparknlp
>>> from sparknlp.base import *
>>> from sparknlp.annotator import *
>>> from pyspark.ml import Pipeline
>>> documentAssembler = DocumentAssembler() \
...     .setInputCol("text") \
...     .setOutputCol("document")
>>> sentence = SentenceDetector() \
...     .setInputCols(["document"]) \
...     .setOutputCol("sentence")
>>> tokenizer = Tokenizer() \
...     .setInputCols(["sentence"]) \
...     .setOutputCol("token")
>>> posTagger = PerceptronModel.pretrained() \
...     .setInputCols(["sentence", "token"]) \
...     .setOutputCol("pos")
>>> dependencyParser = DependencyParserModel.pretrained() \
...     .setInputCols(["sentence", "pos", "token"]) \
...     .setOutputCol("dependency")
>>> typedDependencyParser = TypedDependencyParserModel.pretrained() \
...     .setInputCols(["dependency", "pos", "token"]) \
...     .setOutputCol("dependency_type")
>>> pipeline = Pipeline().setStages([
...     documentAssembler,
...     sentence,
...     tokenizer,
...     posTagger,
...     dependencyParser,
...     typedDependencyParser
... ])
>>> data = spark.createDataFrame([[
...     "Unions representing workers at Turner Newall say they are 'disappointed' after talks with stricken parent " +
...       "firm Federal Mogul."
... ]]).toDF("text")
>>> result = pipeline.fit(data).transform(data)
>>> result.selectExpr("explode(arrays_zip(token.result, dependency.result, dependency_type.result)) as cols") \
...     .selectExpr("cols['0'] as token", "cols['1'] as dependency", "cols['2'] as dependency_type") \
...     .show(8, truncate = False)
+------------+------------+---------------+
|token       |dependency  |dependency_type|
+------------+------------+---------------+
|Unions      |ROOT        |root           |
|representing|workers     |amod           |
|workers     |Unions      |flat           |
|at          |Turner      |case           |
|Turner      |workers     |flat           |
|Newall      |say         |nsubj          |
|say         |Unions      |parataxis      |
|they        |disappointed|nsubj          |
+------------+------------+---------------+

name = 'TypedDependencyParserModel'[source]#

inputAnnotatorTypes[source]#

outputAnnotatorType = 'labeled_dependency'[source]#

trainOptions[source]#

trainParameters[source]#

trainDependencyPipe[source]#

conllFormat[source]#

static pretrained(name='dependency_typed_conllu', lang='en', remote_loc=None)[source]#

Downloads and loads a pretrained model.

Parameters:

namestr, optional: Name of the pretrained model, by default “dependency_typed_conllu”
langstr, optional: Language of the pretrained model, by default “en”
remote_locstr, optional: Optional remote address of the resource, by default None. Will use Spark NLPs repositories otherwise.

Returns:

TypedDependencyParserModel: The restored model

sparknlp.annotator.dependency.typed_dependency_parser#

Module Contents#

Classes#

`sparknlp.annotator.dependency.typed_dependency_parser`#