sparknlp.base.finisher#

Contains classes for the Finisher.

Module Contents#

Classes#

Finisher

Converts annotation results into a format that easier to use.

class Finisher[source]#

Converts annotation results into a format that easier to use.

It is useful to extract the results from Spark NLP Pipelines. The Finisher outputs annotation(s) values into String.

For more extended examples on document pre-processing see the `Examples <JohnSnowLabs/spark-nlp

>`__.

Input Annotation types

Output Annotation type

ANY

NONE

Parameters:
inputCols

Input annotations

outputCols

Output finished annotation cols

valueSplitSymbol

Character separating values, by default #

annotationSplitSymbol

Character separating annotations, by default @

cleanAnnotations

Whether to remove annotation columns, by default True

includeMetadata

Whether to include annotation metadata, by default False

outputAsArray

Finisher generates an Array with the results instead of string, by default True

parseEmbeddingsVectors

Whether to include embeddings vectors in the process, by default False

See also

Finisher

for finishing Strings

Examples

>>> import sparknlp
>>> from sparknlp.base import *
>>> from sparknlp.annotator import *
>>> from sparknlp.pretrained import PretrainedPipeline
>>> data = spark.createDataFrame([[1, "New York and New Jersey aren't that far apart actually."]]).toDF("id", "text")

Define pretrained pipeline that extracts Named Entities amongst other things and apply the Finisher on it.

>>> pipeline = PretrainedPipeline("explain_document_dl")
>>> finisher = Finisher().setInputCols("entities").setOutputCols("output")
>>> explainResult = pipeline.transform(data)

Show results.

>>> explainResult.selectExpr("explode(entities)").show(truncate=False)
+------------------------------------------------------------------------------------------------------------------------------------------------------+
|entities                                                                                                                                              |
+------------------------------------------------------------------------------------------------------------------------------------------------------+
|[[chunk, 0, 7, New York, [entity -> LOC, sentence -> 0, chunk -> 0], []], [chunk, 13, 22, New Jersey, [entity -> LOC, sentence -> 0, chunk -> 1], []]]|
+------------------------------------------------------------------------------------------------------------------------------------------------------+
>>> result = finisher.transform(explainResult)
>>> result.select("output").show(truncate=False)
+----------------------+
|output                |
+----------------------+
|[New York, New Jersey]|
+----------------------+
setInputCols(*value)[source]#

Sets column names of input annotations.

Parameters:
*valueList[str]

Input columns for the annotator

setOutputCols(*value)[source]#

Sets column names of finished output annotations.

Parameters:
*valueList[str]

List of output columns

setValueSplitSymbol(value)[source]#

Sets character separating values, by default #.

Parameters:
valuestr

Character to separate annotations

setAnnotationSplitSymbol(value)[source]#

Sets character separating annotations, by default @.

Parameters:
valuestr

setCleanAnnotations(value)[source]#

Sets whether to remove annotation columns, by default True.

Parameters:
valuebool

Whether to remove annotation columns

setIncludeMetadata(value)[source]#

Sets whether to include annotation metadata.

Parameters:
valuebool

Whether to include annotation metadata

setOutputAsArray(value)[source]#

Sets whether to generate an array with the results instead of a string.

Parameters:
valuebool

Whether to generate an array with the results instead of a string

setParseEmbeddingsVectors(value)[source]#

Sets whether to include embeddings vectors in the process.

Parameters:
valuebool

Whether to include embeddings vectors in the process

getInputCols()[source]#

Gets input columns name of annotations.

getOutputCols()[source]#

Gets output columns name of annotations.