sparknlp.pretrained.pretrained_pipeline#

Contains classes for the PretrainedPipeline.

Module Contents#

Classes#

PretrainedPipeline

Loads a Represents a fully constructed and trained Spark NLP pipeline,

class PretrainedPipeline(name, lang='en', remote_loc=None, parse_embeddings=False, disk_location=None)[source]#

Loads a Represents a fully constructed and trained Spark NLP pipeline, ready to be used.

This way, a whole pipeline can be defined in 1 line. Additionally, the LightPipeline version of the model can be retrieved with member light_model.

For more extended examples see the Pipelines page and our Github Model Repository for available pipeline models.

Parameters:
namestr

Name of the PretrainedPipeline. These can be gathered from the Pipelines Page.

langstr, optional

Langauge of the model, by default ‘en’

remote_locstr, optional

Link to the remote location of the model (if it was already downloaded), by default None

parse_embeddingsbool, optional

Whether to parse embeddings, by default False

disk_locationstr , optional

Path to locally stored PretrainedPipeline, by default None

annotate(target, column=None)[source]#

Annotates the data provided, extracting the results.

The data should be either a list or a str.

Parameters:
targetlist or str

The data to be annotated

Returns:
List[dict] or dict

The result of the annotation

Examples

>>> from sparknlp.pretrained import PretrainedPipeline
>>> explain_document_pipeline = PretrainedPipeline("explain_document_dl")
>>> result = explain_document_pipeline.annotate('U.N. official Ekeus heads for Baghdad.')
>>> result.keys()
dict_keys(['entities', 'stem', 'checked', 'lemma', 'document', 'pos', 'token', 'ner', 'embeddings', 'sentence'])
>>> result["ner"]
['B-ORG', 'O', 'O', 'B-PER', 'O', 'O', 'B-LOC', 'O']
fullAnnotate(target, optional_target='')[source]#

Annotates the data provided into Annotation type results.

The data should be either a list or a str.

Parameters:
targetlist or str

The data to be annotated

Returns:
List[Annotation]

The result of the annotation

Examples

>>> from sparknlp.pretrained import PretrainedPipeline
>>> explain_document_pipeline = PretrainedPipeline("explain_document_dl")
>>> result = explain_document_pipeline.fullAnnotate('U.N. official Ekeus heads for Baghdad.')
>>> result[0].keys()
dict_keys(['entities', 'stem', 'checked', 'lemma', 'document', 'pos', 'token', 'ner', 'embeddings', 'sentence'])
>>> result[0]["ner"]
[Annotation(named_entity, 0, 2, B-ORG, {'word': 'U.N'}),
Annotation(named_entity, 3, 3, O, {'word': '.'}),
Annotation(named_entity, 5, 12, O, {'word': 'official'}),
Annotation(named_entity, 14, 18, B-PER, {'word': 'Ekeus'}),
Annotation(named_entity, 20, 24, O, {'word': 'heads'}),
Annotation(named_entity, 26, 28, O, {'word': 'for'}),
Annotation(named_entity, 30, 36, B-LOC, {'word': 'Baghdad'}),
Annotation(named_entity, 37, 37, O, {'word': '.'})]
fullAnnotateImage(path_to_image)[source]#

Annotates the data provided into Annotation type results.

The data should be either a list or a str.

Parameters:
path_to_imagelist or str

Source path of image, list of paths to images

Returns:
List[AnnotationImage]

The result of the annotation

transform(data)[source]#

Transforms the input dataset with Spark.

Parameters:
datapyspark.sql.DataFrame

input dataset

Returns:
pyspark.sql.DataFrame

transformed dataset