sparknlp.pretrained.pretrained_pipeline
#
Contains classes for the PretrainedPipeline.
Module Contents#
Classes#
Loads a Represents a fully constructed and trained Spark NLP pipeline, |
- class PretrainedPipeline(name, lang='en', remote_loc=None, parse_embeddings=False, disk_location=None)[source]#
Loads a Represents a fully constructed and trained Spark NLP pipeline, ready to be used.
This way, a whole pipeline can be defined in 1 line. Additionally, the
LightPipeline
version of the model can be retrieved with memberlight_model
.For more extended examples see the Pipelines page and our Github Model Repository for available pipeline models.
- Parameters:
- namestr
Name of the PretrainedPipeline. These can be gathered from the Pipelines Page.
- langstr, optional
Langauge of the model, by default ‘en’
- remote_locstr, optional
Link to the remote location of the model (if it was already downloaded), by default None
- parse_embeddingsbool, optional
Whether to parse embeddings, by default False
- disk_locationstr , optional
Path to locally stored PretrainedPipeline, by default None
- annotate(target, column=None)[source]#
Annotates the data provided, extracting the results.
The data should be either a list or a str.
- Parameters:
- targetlist or str
The data to be annotated
- Returns:
- List[dict] or dict
The result of the annotation
Examples
>>> from sparknlp.pretrained import PretrainedPipeline >>> explain_document_pipeline = PretrainedPipeline("explain_document_dl") >>> result = explain_document_pipeline.annotate('U.N. official Ekeus heads for Baghdad.') >>> result.keys() dict_keys(['entities', 'stem', 'checked', 'lemma', 'document', 'pos', 'token', 'ner', 'embeddings', 'sentence']) >>> result["ner"] ['B-ORG', 'O', 'O', 'B-PER', 'O', 'O', 'B-LOC', 'O']
- fullAnnotate(target, optional_target='')[source]#
Annotates the data provided into Annotation type results.
The data should be either a list or a str.
- Parameters:
- targetlist or str
The data to be annotated
- Returns:
- List[Annotation]
The result of the annotation
Examples
>>> from sparknlp.pretrained import PretrainedPipeline >>> explain_document_pipeline = PretrainedPipeline("explain_document_dl") >>> result = explain_document_pipeline.fullAnnotate('U.N. official Ekeus heads for Baghdad.') >>> result[0].keys() dict_keys(['entities', 'stem', 'checked', 'lemma', 'document', 'pos', 'token', 'ner', 'embeddings', 'sentence']) >>> result[0]["ner"] [Annotation(named_entity, 0, 2, B-ORG, {'word': 'U.N'}), Annotation(named_entity, 3, 3, O, {'word': '.'}), Annotation(named_entity, 5, 12, O, {'word': 'official'}), Annotation(named_entity, 14, 18, B-PER, {'word': 'Ekeus'}), Annotation(named_entity, 20, 24, O, {'word': 'heads'}), Annotation(named_entity, 26, 28, O, {'word': 'for'}), Annotation(named_entity, 30, 36, B-LOC, {'word': 'Baghdad'}), Annotation(named_entity, 37, 37, O, {'word': '.'})]
- fullAnnotateImage(path_to_image)[source]#
Annotates the data provided into Annotation type results.
The data should be either a list or a str.
- Parameters:
- path_to_imagelist or str
Source path of image, list of paths to images
- Returns:
- List[AnnotationImage]
The result of the annotation
- transform(data)[source]#
Transforms the input dataset with Spark.
- Parameters:
- data
pyspark.sql.DataFrame
input dataset
- data
- Returns:
pyspark.sql.DataFrame
transformed dataset