`sparknlp.pretrained.pretrained_pipeline`#

Contains classes for the PretrainedPipeline.

Module Contents#

Classes#

PretrainedPipeline

Loads a Represents a fully constructed and trained Spark NLP pipeline,

class PretrainedPipeline(name, lang='en', remote_loc=None, parse_embeddings=False, disk_location=None)[source]#

Loads a Represents a fully constructed and trained Spark NLP pipeline, ready to be used.

This way, a whole pipeline can be defined in 1 line. Additionally, the LightPipeline version of the model can be retrieved with member light_model.

For more extended examples see the Pipelines page and our Github Model Repository for available pipeline models.

Parameters:

namestr: Name of the PretrainedPipeline. These can be gathered from the Pipelines Page.
langstr, optional: Langauge of the model, by default ‘en’
remote_locstr, optional: Link to the remote location of the model (if it was already downloaded), by default None
parse_embeddingsbool, optional: Whether to parse embeddings, by default False
disk_locationstr , optional: Path to locally stored PretrainedPipeline, by default None

annotate(target, column=None)[source]#

Annotates the data provided, extracting the results.

The data should be either a list or a str.

Parameters:

targetlist or str: The data to be annotated

Returns:

List[dict] or dict: The result of the annotation

Examples

>>> from sparknlp.pretrained import PretrainedPipeline
>>> explain_document_pipeline = PretrainedPipeline("explain_document_dl")
>>> result = explain_document_pipeline.annotate('U.N. official Ekeus heads for Baghdad.')
>>> result.keys()
dict_keys(['entities', 'stem', 'checked', 'lemma', 'document', 'pos', 'token', 'ner', 'embeddings', 'sentence'])
>>> result["ner"]
['B-ORG', 'O', 'O', 'B-PER', 'O', 'O', 'B-LOC', 'O']

fullAnnotate(target, optional_target='')[source]#

Annotates the data provided into Annotation type results.

The data should be either a list or a str.

Parameters:

targetlist or str: The data to be annotated

Returns:

List[Annotation]: The result of the annotation

Examples

>>> from sparknlp.pretrained import PretrainedPipeline
>>> explain_document_pipeline = PretrainedPipeline("explain_document_dl")
>>> result = explain_document_pipeline.fullAnnotate('U.N. official Ekeus heads for Baghdad.')
>>> result[0].keys()
dict_keys(['entities', 'stem', 'checked', 'lemma', 'document', 'pos', 'token', 'ner', 'embeddings', 'sentence'])
>>> result[0]["ner"]
[Annotation(named_entity, 0, 2, B-ORG, {'word': 'U.N'}),
Annotation(named_entity, 3, 3, O, {'word': '.'}),
Annotation(named_entity, 5, 12, O, {'word': 'official'}),
Annotation(named_entity, 14, 18, B-PER, {'word': 'Ekeus'}),
Annotation(named_entity, 20, 24, O, {'word': 'heads'}),
Annotation(named_entity, 26, 28, O, {'word': 'for'}),
Annotation(named_entity, 30, 36, B-LOC, {'word': 'Baghdad'}),
Annotation(named_entity, 37, 37, O, {'word': '.'})]

fullAnnotateImage(path_to_image)[source]#

Annotates the data provided into Annotation type results.

The data should be either a list or a str.

Parameters:

path_to_imagelist or str: Source path of image, list of paths to images

Returns:

List[AnnotationImage]: The result of the annotation

transform(data)[source]#

Transforms the input dataset with Spark.

Parameters:

datapyspark.sql.DataFrame: input dataset

Returns:

pyspark.sql.DataFrame: transformed dataset

sparknlp.pretrained.pretrained_pipeline#

Module Contents#

Classes#

`sparknlp.pretrained.pretrained_pipeline`#