`sparknlp.training.conllu`#

Contains classes for CoNLLU.

Module Contents#

Classes#

CoNLLU

Instantiates the class to read a CoNLL-U dataset.

class CoNLLU(textCol='text', documentCol='document', sentenceCol='sentence', formCol='form', uposCol='upos', xposCol='xpos', lemmaCol='lemma', explodeSentences=True)[source]#

Instantiates the class to read a CoNLL-U dataset.

The dataset should be in the format of CoNLL-U and needs to be specified with readDataset(), which will create a dataframe with the data.

Can be used to train a DependencyParserApproach

Input File Format:

# sent_id = 1
# text = They buy and sell books.
 They     they    PRON    PRP    Case=Nom|Number=Plur               2   nsubj   2:nsubj|4:nsubj   _
 buy      buy     VERB    VBP    Number=Plur|Person=3|Tense=Pres    0   root    0:root            _
 and      and     CONJ    CC     _                                  4   cc      4:cc              _
 sell     sell    VERB    VBP    Number=Plur|Person=3|Tense=Pres    2   conj    0:root|2:conj     _
 books    book    NOUN    NNS    Number=Plur                        2   obj     2:obj|4:obj       SpaceAfter=No
 .        .       PUNCT   .      _                                  2   punct   2:punct           _

Examples

>>> from sparknlp.training import CoNLLU
>>> conlluFile = "src/test/resources/conllu/en.test.conllu"
>>> conllDataSet = CoNLLU(False).readDataset(spark, conlluFile)
>>> conllDataSet.selectExpr(
...     "text",
...     "form.result as form",
...     "upos.result as upos",
...     "xpos.result as xpos",
...     "lemma.result as lemma"
... ).show(1, False)
+---------------------------------------+----------------------------------------------+---------------------------------------------+------------------------------+--------------------------------------------+
|text                                   |form                                          |upos                                         |xpos                          |lemma                                       |
+---------------------------------------+----------------------------------------------+---------------------------------------------+------------------------------+--------------------------------------------+
|What if Google Morphed Into GoogleOS?  |[What, if, Google, Morphed, Into, GoogleOS, ?]|[PRON, SCONJ, PROPN, VERB, ADP, PROPN, PUNCT]|[WP, IN, NNP, VBD, IN, NNP, .]|[what, if, Google, morph, into, GoogleOS, ?]|
+---------------------------------------+----------------------------------------------+---------------------------------------------+------------------------------+--------------------------------------------+

readDataset(spark, path, read_as=ReadAs.TEXT)[source]#

Reads the dataset from an external resource.

Parameters:

sparkpyspark.sql.SparkSession: Initiated Spark Session with Spark NLP
pathstr: Path to the resource
read_asstr, optional: How to read the resource, by default ReadAs.TEXT

Returns:

pyspark.sql.DataFrame: Spark Dataframe with the data

sparknlp.training.conllu#

Module Contents#

Classes#

`sparknlp.training.conllu`#