package dep
- Alphabetic
- Public
- All
Type Members
-
class
DependencyParserApproach extends AnnotatorApproach[DependencyParserModel]
Trains an unlabeled parser that finds a grammatical relations between two words in a sentence.
Trains an unlabeled parser that finds a grammatical relations between two words in a sentence.
For instantiated/pretrained models, see DependencyParserModel.
Dependency parser provides information about word relationship. For example, dependency parsing can tell you what the subjects and objects of a verb are, as well as which words are modifying (describing) the subject. This can help you find precise answers to specific questions.
The required training data can be set in two different ways (only one can be chosen for a particular model):
- Dependency treebank in the Penn Treebank format set
with
setDependencyTreeBank
- Dataset in the CoNLL-U format set with
setConllU
Apart from that, no additional training data is needed.
See DependencyParserApproachTestSpec for further reference on how to use this API.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base.DocumentAssembler import com.johnsnowlabs.nlp.annotators.sbd.pragmatic.SentenceDetector import com.johnsnowlabs.nlp.annotators.Tokenizer import com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronModel import com.johnsnowlabs.nlp.annotators.parser.dep.DependencyParserApproach import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val sentence = new SentenceDetector() .setInputCols("document") .setOutputCol("sentence") val tokenizer = new Tokenizer() .setInputCols("sentence") .setOutputCol("token") val posTagger = PerceptronModel.pretrained() .setInputCols("sentence", "token") .setOutputCol("pos") val dependencyParserApproach = new DependencyParserApproach() .setInputCols("sentence", "pos", "token") .setOutputCol("dependency") .setDependencyTreeBank("src/test/resources/parser/unlabeled/dependency_treebank") val pipeline = new Pipeline().setStages(Array( documentAssembler, sentence, tokenizer, posTagger, dependencyParserApproach )) // Additional training data is not needed, the dependency parser relies on the dependency tree bank / CoNLL-U only. val emptyDataSet = Seq.empty[String].toDF("text") val pipelineModel = pipeline.fit(emptyDataSet)
- See also
TypedDependencyParserApproach to extract labels for the dependencies
- Dependency treebank in the Penn Treebank format set
with
-
class
DependencyParserModel extends AnnotatorModel[DependencyParserModel] with HasSimpleAnnotate[DependencyParserModel]
Unlabeled parser that finds a grammatical relation between two words in a sentence.
Unlabeled parser that finds a grammatical relation between two words in a sentence.
Dependency parser provides information about word relationship. For example, dependency parsing can tell you what the subjects and objects of a verb are, as well as which words are modifying (describing) the subject. This can help you find precise answers to specific questions.
This is the instantiated model of the DependencyParserApproach. For training your own model, please see the documentation of that class.
Pretrained models can be loaded with
pretrained
of the companion object:val dependencyParserApproach = DependencyParserModel.pretrained() .setInputCols("sentence", "pos", "token") .setOutputCol("dependency")
The default model is
"dependency_conllu"
, if no name is provided. For available pretrained models please see the Models Hub.For extended examples of usage, see the Examples and the DependencyParserApproachTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base.DocumentAssembler import com.johnsnowlabs.nlp.annotators.Tokenizer import com.johnsnowlabs.nlp.annotators.parser.dep.DependencyParserModel import com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronModel import com.johnsnowlabs.nlp.annotators.sbd.pragmatic.SentenceDetector import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val sentence = new SentenceDetector() .setInputCols("document") .setOutputCol("sentence") val tokenizer = new Tokenizer() .setInputCols("sentence") .setOutputCol("token") val posTagger = PerceptronModel.pretrained() .setInputCols("sentence", "token") .setOutputCol("pos") val dependencyParser = DependencyParserModel.pretrained() .setInputCols("sentence", "pos", "token") .setOutputCol("dependency") val pipeline = new Pipeline().setStages(Array( documentAssembler, sentence, tokenizer, posTagger, dependencyParser )) val data = Seq( "Unions representing workers at Turner Newall say they are 'disappointed' after talks with stricken parent " + "firm Federal Mogul." ).toDF("text") val result = pipeline.fit(data).transform(data) result.selectExpr("explode(arrays_zip(token.result, dependency.result)) as cols") .selectExpr("cols['0'] as token", "cols['1'] as dependency").show(8, truncate = false) +------------+------------+ |token |dependency | +------------+------------+ |Unions |ROOT | |representing|workers | |workers |Unions | |at |Turner | |Turner |workers | |Newall |say | |say |Unions | |they |disappointed| +------------+------------+
- See also
TypedDependencyParserMdoel to extract labels for the dependencies
- class Perceptron extends Serializable
- trait ReadablePretrainedDependency extends ParamsAndFeaturesReadable[DependencyParserModel] with HasPretrained[DependencyParserModel]
- class Tagger extends Serializable
Value Members
-
object
DependencyParserApproach extends DefaultParamsReadable[DependencyParserApproach] with Serializable
This is the companion object of DependencyParserApproach.
This is the companion object of DependencyParserApproach. Please refer to that class for the documentation.
-
object
DependencyParserModel extends ReadablePretrainedDependency with Serializable
This is the companion object of DependencyParserModel.
This is the companion object of DependencyParserModel. Please refer to that class for the documentation.
- object TagDictionary
- object Tagger extends Serializable