Explain Document DL Pipeline for English

Description

The explain_document_dl is a pretrained pipeline that we can use to process text with a simple pipeline that performs basic processing steps and recognizes entities . It performs most of the common text processing tasks on your dataframe

Predicted Entities

Download Copy S3 URI

How to use

from sparknlp.pretrained import PretrainedPipeline
pipeline = PretrainedPipeline('explain_document_dl', lang = 'en')
annotations =  pipeline.fullAnnotate("The Mona Lisa is an oil painting from the 16th century.")[0]
annotations.keys()
val pipeline = new PretrainedPipeline("explain_document_dl", lang = "en")
val result = pipeline.fullAnnotate("The Mona Lisa is an oil painting from the 16th century.")(0)
import nlu
text = ["The Mona Lisa is an oil painting from the 16th century."]
result_df = nlu.load('en.explain.dl').predict(text)
result_df

Results

Results


+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------+-----------+
|                                              text|                                          document|                                          sentence|                                             token|                                           checked|                                             lemma|                                              stem|                                               pos|                                        embeddings|                                         ner|   entities|
+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------+-----------+
|The Mona Lisa is an oil painting from the 16th ...|[The Mona Lisa is an oil painting from the 16th...|[The Mona Lisa is an oil painting from the 16th...|[The, Mona, Lisa, is, an, oil, painting, from, ...|[The, Mona, Lisa, is, an, oil, painting, from, ...|[The, Mona, Lisa, be, an, oil, painting, from, ...|[the, mona, lisa, i, an, oil, paint, from, the,...|[DT, NNP, NNP, VBZ, DT, NN, NN, IN, DT, JJ, NN, .]|[[-0.038194, -0.24487, 0.72812, -0.39961, 0.083...|[O, B-PER, I-PER, O, O, O, O, O, O, O, O, O]|[Mona Lisa]|
+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------+-----------+


{:.model-param}

Model Information

Model Name: explain_document_dl
Type: pipeline
Compatibility: Spark NLP 4.4.2+
License: Open Source
Edition: Official
Language: en
Size: 176.2 MB

Included Models

  • DocumentAssembler
  • SentenceDetector
  • TokenizerModel
  • NorvigSweetingModel
  • LemmatizerModel
  • Stemmer
  • PerceptronModel
  • WordEmbeddingsModel
  • NerDLModel
  • NerConverter