`sparknlp.annotation`#

Contains the Annotation data format

Module Contents#

Classes#

Annotation

Represents the output of Spark NLP Annotators and their details.

class Annotation(annotatorType, begin, end, result, metadata, embeddings)[source]#

Represents the output of Spark NLP Annotators and their details.

Parameters:

annotator_typestr: The type of the output of the annotator. Possible values are DOCUMENT, TOKEN, WORDPIECE, WORD_EMBEDDINGS, SENTENCE_EMBEDDINGS, CATEGORY, DATE, ENTITY, SENTIMENT, POS, CHUNK, NAMED_ENTITY, NEGEX, DEPENDENCY, LABELED_DEPENDENCY, LANGUAGE, KEYWORD, DUMMY.
beginint: The index of the first character under this annotation.
endint: The index of the last character under this annotation.
resultstr: The resulting string of the annotation.
metadatadict: Associated metadata for this annotation
embeddingslist: Embeddings vector where applicable

annotatorType[source]#

begin[source]#

end[source]#

result[source]#

metadata[source]#

embeddings[source]#

copy(result)[source]#

Creates new Annotation with a different result, containing all settings of this Annotation.

Parameters:

resultstr: The result of the annotation that should be copied.

Returns:

Annotation: Newly created Annotation

static dataType()[source]#

Returns a Spark StructType, that represents the schema of the Annotation.

The Schema looks like:

struct (containsNull = True)
|-- annotatorType: string (nullable = False)
|-- begin: integer (nullable = False)
|-- end: integer (nullable = False)
|-- result: string (nullable = False)
|-- metadata: map (nullable = False)
|    |-- key: string
|    |-- value: string (valueContainsNull = True)
|-- embeddings: array (nullable = False)
|    |-- element: float (containsNull = False)

Returns:

pyspark.sql.types.StructType: Spark Schema of the Annotation

static arrayType()[source]#

Returns a Spark ArrayType, that contains the dataType of the annotation.

Returns:

pyspark.sql.types.ArrayType: ArrayType with the Annotation data type embedded.

static fromRow(row)[source]#

Creates a Annotation from a Spark Row.

Parameters:

rowpyspark.sql.Row: Spark row containing columns for annotatorType, begin, end, result, metadata, embeddings.

Returns:

Annotation: The new Annotation.

static toRow(annotation)[source]#

Transforms an Annotation to a Spark Row.

Parameters:

annotationAnnotation: The Annotation to be transformed.

Returns:

pyspark.sql.Row: The new Row.

sparknlp.annotation#

Module Contents#

Classes#

`sparknlp.annotation`#