dep

package dep

Ordering

Alphabetic

Visibility

Public
All

Type Members

class DependencyParserApproach extends AnnotatorApproach[DependencyParserModel]

Trains an unlabeled parser that finds a grammatical relations between two words in a sentence.

For instantiated/pretrained models, see DependencyParserModel.

Dependency parser provides information about word relationship. For example, dependency parsing can tell you what the subjects and objects of a verb are, as well as which words are modifying (describing) the subject. This can help you find precise answers to specific questions.

The required training data can be set in two different ways (only one can be chosen for a particular model):

Dependency treebank in the Penn Treebank format set with setDependencyTreeBank
Dataset in the CoNLL-U format set with setConllU

Apart from that, no additional training data is needed.

See DependencyParserApproachTestSpec for further reference on how to use this API.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotators.sbd.pragmatic.SentenceDetector
import com.johnsnowlabs.nlp.annotators.Tokenizer
import com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronModel
import com.johnsnowlabs.nlp.annotators.parser.dep.DependencyParserApproach
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val sentence = new SentenceDetector()
  .setInputCols("document")
  .setOutputCol("sentence")

val tokenizer = new Tokenizer()
  .setInputCols("sentence")
  .setOutputCol("token")

val posTagger = PerceptronModel.pretrained()
  .setInputCols("sentence", "token")
  .setOutputCol("pos")

val dependencyParserApproach = new DependencyParserApproach()
  .setInputCols("sentence", "pos", "token")
  .setOutputCol("dependency")
  .setDependencyTreeBank("src/test/resources/parser/unlabeled/dependency_treebank")

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  sentence,
  tokenizer,
  posTagger,
  dependencyParserApproach
))

// Additional training data is not needed, the dependency parser relies on the dependency tree bank / CoNLL-U only.
val emptyDataSet = Seq.empty[String].toDF("text")
val pipelineModel = pipeline.fit(emptyDataSet)

See also: TypedDependencyParserApproach to extract labels for the dependencies

class DependencyParserModel extends AnnotatorModel[DependencyParserModel] with HasSimpleAnnotate[DependencyParserModel]

Unlabeled parser that finds a grammatical relation between two words in a sentence.

This is the instantiated model of the DependencyParserApproach. For training your own model, please see the documentation of that class.

Pretrained models can be loaded with pretrained of the companion object:

val dependencyParserApproach = DependencyParserModel.pretrained()
  .setInputCols("sentence", "pos", "token")
  .setOutputCol("dependency")

The default model is "dependency_conllu", if no name is provided. For available pretrained models please see the Models Hub.

For extended examples of usage, see the Examples and the DependencyParserApproachTestSpec.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotators.Tokenizer
import com.johnsnowlabs.nlp.annotators.parser.dep.DependencyParserModel
import com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronModel
import com.johnsnowlabs.nlp.annotators.sbd.pragmatic.SentenceDetector
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val sentence = new SentenceDetector()
  .setInputCols("document")
  .setOutputCol("sentence")

val tokenizer = new Tokenizer()
  .setInputCols("sentence")
  .setOutputCol("token")

val posTagger = PerceptronModel.pretrained()
  .setInputCols("sentence", "token")
  .setOutputCol("pos")

val dependencyParser = DependencyParserModel.pretrained()
  .setInputCols("sentence", "pos", "token")
  .setOutputCol("dependency")

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  sentence,
  tokenizer,
  posTagger,
  dependencyParser
))

val data = Seq(
  "Unions representing workers at Turner Newall say they are 'disappointed' after talks with stricken parent " +
    "firm Federal Mogul."
).toDF("text")
val result = pipeline.fit(data).transform(data)

result.selectExpr("explode(arrays_zip(token.result, dependency.result)) as cols")
  .selectExpr("cols['0'] as token", "cols['1'] as dependency").show(8, truncate = false)
+------------+------------+
|token       |dependency  |
+------------+------------+
|Unions      |ROOT        |
|representing|workers     |
|workers     |Unions      |
|at          |Turner      |
|Turner      |workers     |
|Newall      |say         |
|say         |Unions      |
|they        |disappointed|
+------------+------------+

See also: TypedDependencyParserMdoel to extract labels for the dependencies

class Perceptron extends Serializable
trait ReadablePretrainedDependency extends ParamsAndFeaturesReadable[DependencyParserModel] with HasPretrained[DependencyParserModel]
class Tagger extends Serializable

Value Members

object DependencyParserApproach extends DefaultParamsReadable[DependencyParserApproach] with Serializable
This is the companion object of DependencyParserApproach.
This is the companion object of DependencyParserApproach. Please refer to that class for the documentation.
object DependencyParserModel extends ReadablePretrainedDependency with Serializable
This is the companion object of DependencyParserModel.
This is the companion object of DependencyParserModel. Please refer to that class for the documentation.
object TagDictionary
object Tagger extends Serializable

Packages

dep

package dep

Type Members

Example

Example

Value Members

Ungrouped

Packages

dep 

package dep

Type Members

Example

Example

Value Members

Ungrouped

dep