Explain Document DL

Description

The explain_document_dl is a pretrained pipeline that we can use to process text with a simple pipeline that performs basic processing steps.

Open in Colab Download Copy S3 URI

How to use

pipeline = PretrainedPipeline('explain_document_dl', lang = 'en')

annotations =  pipeline.fullAnnotate("""French author who helped pioner the science-fiction genre. Verne wrate about space, air, and underwater travel before navigable aircrast and practical submarines were invented, and before any means of space travel had been devised.""")[0]

annotations.keys()

val pipeline = new PretrainedPipeline('explain_document_dl', lang = 'en')

val result = pipeline.fullAnnotate("French author who helped pioner the science-fiction genre. Verne wrate about space, air, and underwater travel before navigable aircrast and practical submarines were invented, and before any means of space travel had been devised.")(0)

import nlu

text = ["""John Snow built a detailed map of all the households where people died, and came to the conclusion that the fault was one public water pump that all the victims had used."""]
explain_df = nlu.load('en.explain.dl').predict(text)
explain_df

Results

+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|                text|            document|            sentence|               token|               spell|              lemmas|               stems|                 pos|
+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|French author who...|[[document, 0, 23...|[[document, 0, 57...|[[token, 0, 5, Fr...|[[token, 0, 5, Fr...|[[token, 0, 5, Fr...|[[token, 0, 5, fr...|[[pos, 0, 5, JJ, ...|
+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+

Model Information

Model Name:	explain_document_dl
Type:	pipeline
Compatibility:	Spark NLP 2.5.5+
License:	Open Source
Edition:	Community
Language:	[en]

Included Models

The explain_document_dl has one Transformer and six annotators:

Documenssembler - A Transformer that creates a column that contains documents.
Sentence Segmenter - An annotator that produces the sentences of the document.
Tokenizer - An annotator that produces the tokens of the sentences.
SpellChecker - An annotator that produces the spelling-corrected tokens.
Stemmer - An annotator that produces the stems of the tokens.
Lemmatizer - An annotator that produces the lemmas of the tokens.
POS Tagger - An annotator that produces the parts of speech of the associated tokens.

PREVIOUSDetect Persons, Locations, Organizations and Misc Entities in Russian (WikiNER 840B 300)

NEXTDetect Entities (GloVe)