sparknlp.base.finisher
#
Contains classes for the Finisher.
Module Contents#
Classes#
Converts annotation results into a format that easier to use. |
- class Finisher[source]#
Converts annotation results into a format that easier to use.
It is useful to extract the results from Spark NLP Pipelines. The Finisher outputs annotation(s) values into
String
.For more extended examples on document pre-processing see the `Examples <JohnSnowLabs/spark-nlp
>`__.
Input Annotation types
Output Annotation type
ANY
NONE
- Parameters:
- inputCols
Input annotations
- outputCols
Output finished annotation cols
- valueSplitSymbol
Character separating values, by default #
- annotationSplitSymbol
Character separating annotations, by default @
- cleanAnnotations
Whether to remove annotation columns, by default True
- includeMetadata
Whether to include annotation metadata, by default False
- outputAsArray
Finisher generates an Array with the results instead of string, by default True
- parseEmbeddingsVectors
Whether to include embeddings vectors in the process, by default False
See also
Finisher
for finishing Strings
Examples
>>> import sparknlp >>> from sparknlp.base import * >>> from sparknlp.annotator import * >>> from sparknlp.pretrained import PretrainedPipeline >>> data = spark.createDataFrame([[1, "New York and New Jersey aren't that far apart actually."]]).toDF("id", "text")
Define pretrained pipeline that extracts Named Entities amongst other things and apply the Finisher on it.
>>> pipeline = PretrainedPipeline("explain_document_dl") >>> finisher = Finisher().setInputCols("entities").setOutputCols("output") >>> explainResult = pipeline.transform(data)
Show results.
>>> explainResult.selectExpr("explode(entities)").show(truncate=False) +------------------------------------------------------------------------------------------------------------------------------------------------------+ |entities | +------------------------------------------------------------------------------------------------------------------------------------------------------+ |[[chunk, 0, 7, New York, [entity -> LOC, sentence -> 0, chunk -> 0], []], [chunk, 13, 22, New Jersey, [entity -> LOC, sentence -> 0, chunk -> 1], []]]| +------------------------------------------------------------------------------------------------------------------------------------------------------+ >>> result = finisher.transform(explainResult) >>> result.select("output").show(truncate=False) +----------------------+ |output | +----------------------+ |[New York, New Jersey]| +----------------------+
- setInputCols(*value)[source]#
Sets column names of input annotations.
- Parameters:
- *valueList[str]
Input columns for the annotator
- setOutputCols(*value)[source]#
Sets column names of finished output annotations.
- Parameters:
- *valueList[str]
List of output columns
- setValueSplitSymbol(value)[source]#
Sets character separating values, by default #.
- Parameters:
- valuestr
Character to separate annotations
- setAnnotationSplitSymbol(value)[source]#
Sets character separating annotations, by default @.
- Parameters:
- valuestr
…
- setCleanAnnotations(value)[source]#
Sets whether to remove annotation columns, by default True.
- Parameters:
- valuebool
Whether to remove annotation columns
- setIncludeMetadata(value)[source]#
Sets whether to include annotation metadata.
- Parameters:
- valuebool
Whether to include annotation metadata
- setOutputAsArray(value)[source]#
Sets whether to generate an array with the results instead of a string.
- Parameters:
- valuebool
Whether to generate an array with the results instead of a string