sparknlp.annotation#

Contains the Annotation data format

Module Contents#

Classes#

Annotation

Represents the output of Spark NLP Annotators and their details.

class Annotation(annotatorType, begin, end, result, metadata, embeddings)[source]#

Represents the output of Spark NLP Annotators and their details.

Parameters:
annotator_typestr

The type of the output of the annotator. Possible values are DOCUMENT, TOKEN, WORDPIECE, WORD_EMBEDDINGS, SENTENCE_EMBEDDINGS, CATEGORY, DATE, ENTITY, SENTIMENT, POS, CHUNK, NAMED_ENTITY, NEGEX, DEPENDENCY, LABELED_DEPENDENCY, LANGUAGE, KEYWORD, DUMMY.

beginint

The index of the first character under this annotation.

endint

The index of the last character under this annotation.

resultstr

The resulting string of the annotation.

metadatadict

Associated metadata for this annotation

embeddingslist

Embeddings vector where applicable

copy(result)[source]#

Creates new Annotation with a different result, containing all settings of this Annotation.

Parameters:
resultstr

The result of the annotation that should be copied.

Returns:
Annotation

Newly created Annotation

static dataType()[source]#

Returns a Spark StructType, that represents the schema of the Annotation.

The Schema looks like:

struct (containsNull = True)
|-- annotatorType: string (nullable = False)
|-- begin: integer (nullable = False)
|-- end: integer (nullable = False)
|-- result: string (nullable = False)
|-- metadata: map (nullable = False)
|    |-- key: string
|    |-- value: string (valueContainsNull = True)
|-- embeddings: array (nullable = False)
|    |-- element: float (containsNull = False)
Returns:
pyspark.sql.types.StructType

Spark Schema of the Annotation

static arrayType()[source]#

Returns a Spark ArrayType, that contains the dataType of the annotation.

Returns:
pyspark.sql.types.ArrayType

ArrayType with the Annotation data type embedded.

static fromRow(row)[source]#

Creates a Annotation from a Spark Row.

Parameters:
rowpyspark.sql.Row

Spark row containing columns for annotatorType, begin, end, result, metadata, embeddings.

Returns:
Annotation

The new Annotation.

static toRow(annotation)[source]#

Transforms an Annotation to a Spark Row.

Parameters:
annotationAnnotation

The Annotation to be transformed.

Returns:
pyspark.sql.Row

The new Row.