sparknlp.base.light_pipeline#
Contains classes for the LightPipeline.
Module Contents#
Classes#
Creates a LightPipeline from a Spark PipelineModel. |
- class LightPipeline(pipelineModel, parse_embeddings=False, output_cols=None)[source]#
Creates a LightPipeline from a Spark PipelineModel.
LightPipeline is a Spark NLP specific Pipeline class equivalent to Spark ML Pipeline. The difference is that it’s execution does not hold to Spark principles, instead it computes everything locally (but in parallel) in order to achieve fast results when dealing with small amounts of data. This means, we do not input a Spark Dataframe, but a string or an Array of strings instead, to be annotated. To create Light Pipelines, you need to input an already trained (fit) Spark ML Pipeline.
It’s
transform()has now an alternativeannotate(), which directly outputs the results.- Parameters:
- pipelineModel
pyspark.ml.PipelineModel The PipelineModel containing Spark NLP Annotators
- parse_embeddingsbool, optional
Whether to parse embeddings, by default False
- pipelineModel
Notes
Use
fullAnnotate()to also output the result asAnnotation, with metadata.Examples
>>> from sparknlp.base import LightPipeline >>> light = LightPipeline(pipeline.fit(data)) >>> light.annotate("We are very happy about Spark NLP") { 'document': ['We are very happy about Spark NLP'], 'lemmas': ['We', 'be', 'very', 'happy', 'about', 'Spark', 'NLP'], 'pos': ['PRP', 'VBP', 'RB', 'JJ', 'IN', 'NNP', 'NNP'], 'sentence': ['We are very happy about Spark NLP'], 'spell': ['We', 'are', 'very', 'happy', 'about', 'Spark', 'NLP'], 'stems': ['we', 'ar', 'veri', 'happi', 'about', 'spark', 'nlp'], 'token': ['We', 'are', 'very', 'happy', 'about', 'Spark', 'NLP'] }
- fullAnnotate(*args, **kwargs)[source]#
Annotate and return full Annotation objects.
- Supports both:
fullAnnotate(text: str)
fullAnnotate(texts: list[str])
fullAnnotate(ids: list[int], texts: list[str])
Examples
>>> from sparknlp.pretrained import PretrainedPipeline >>> explain_document_pipeline = PretrainedPipeline("explain_document_dl") >>> result = explain_document_pipeline.fullAnnotate('U.N. official Ekeus heads for Baghdad.') >>> result[0].keys() dict_keys(['entities', 'stem', 'checked', 'lemma', 'document', 'pos', 'token', 'ner', 'embeddings', 'sentence']) >>> result[0]["ner"] [Annotation(named_entity, 0, 2, B-ORG, {'word': 'U.N'}), Annotation(named_entity, 3, 3, O, {'word': '.'}), Annotation(named_entity, 5, 12, O, {'word': 'official'}), Annotation(named_entity, 14, 18, B-PER, {'word': 'Ekeus'}), Annotation(named_entity, 20, 24, O, {'word': 'heads'}), Annotation(named_entity, 26, 28, O, {'word': 'for'}), Annotation(named_entity, 30, 36, B-LOC, {'word': 'Baghdad'}), Annotation(named_entity, 37, 37, O, {'word': '.'})]
- fullAnnotateImage(path_to_image, text=None)[source]#
Annotates the data provided into Annotation type results.
The data should be either a list or a str.
- Parameters:
- path_to_imagelist or str
Source path of image, list of paths to images
- text: list or str, optional
Optional list or str of texts. If None, defaults to empty list if path_to_image is a list, or empty string if path_to_image is a string.
- Returns:
- List[AnnotationImage]
The result of the annotation
- annotate(*args, **kwargs)[source]#
Annotate text(s) or text(s) with IDs using the LightPipeline.
- Supports both:
annotate(text: str)
annotate(texts: list[str])
annotate(ids: list[int], texts: list[str])
- Returns:
- list[dict[str, list[str]]]
Examples
>>> from sparknlp.pretrained import PretrainedPipeline >>> explain_document_pipeline = PretrainedPipeline("explain_document_dl") >>> result = explain_document_pipeline.annotate('U.N. official Ekeus heads for Baghdad.') >>> result.keys() dict_keys(['entities', 'stem', 'checked', 'lemma', 'document', 'pos', 'token', 'ner', 'embeddings', 'sentence']) >>> result["ner"] ['B-ORG', 'O', 'O', 'B-PER', 'O', 'O', 'B-LOC', 'O']
- transform(dataframe)[source]#
Transforms a dataframe provided with the stages of the LightPipeline.
- Parameters:
- dataframe
pyspark.sql.DataFrame The Dataframe to be transformed
- dataframe
- Returns:
pyspark.sql.DataFrameThe transformed DataFrame