sparknlp.annotation
#
Contains the Annotation data format
Module Contents#
Classes#
Represents the output of Spark NLP Annotators and their details. |
- class Annotation(annotatorType, begin, end, result, metadata, embeddings)[source]#
Represents the output of Spark NLP Annotators and their details.
- Parameters:
- annotator_typestr
The type of the output of the annotator. Possible values are
DOCUMENT, TOKEN, WORDPIECE, WORD_EMBEDDINGS, SENTENCE_EMBEDDINGS, CATEGORY, DATE, ENTITY, SENTIMENT, POS, CHUNK, NAMED_ENTITY, NEGEX, DEPENDENCY, LABELED_DEPENDENCY, LANGUAGE, KEYWORD, DUMMY
.- beginint
The index of the first character under this annotation.
- endint
The index of the last character under this annotation.
- resultstr
The resulting string of the annotation.
- metadatadict
Associated metadata for this annotation
- embeddingslist
Embeddings vector where applicable
- copy(result)[source]#
Creates new Annotation with a different result, containing all settings of this Annotation.
- Parameters:
- resultstr
The result of the annotation that should be copied.
- Returns:
- Annotation
Newly created Annotation
- static dataType()[source]#
Returns a Spark StructType, that represents the schema of the Annotation.
The Schema looks like:
struct (containsNull = True) |-- annotatorType: string (nullable = False) |-- begin: integer (nullable = False) |-- end: integer (nullable = False) |-- result: string (nullable = False) |-- metadata: map (nullable = False) | |-- key: string | |-- value: string (valueContainsNull = True) |-- embeddings: array (nullable = False) | |-- element: float (containsNull = False)
- Returns:
pyspark.sql.types.StructType
Spark Schema of the Annotation
- static arrayType()[source]#
Returns a Spark ArrayType, that contains the dataType of the annotation.
- Returns:
pyspark.sql.types.ArrayType
ArrayType with the Annotation data type embedded.
- static fromRow(row)[source]#
Creates a Annotation from a Spark Row.
- Parameters:
- row
pyspark.sql.Row
Spark row containing columns for
annotatorType, begin, end, result, metadata, embeddings
.
- row
- Returns:
- Annotation
The new Annotation.
- static toRow(annotation)[source]#
Transforms an Annotation to a Spark Row.
- Parameters:
- annotationAnnotation
The Annotation to be transformed.
- Returns:
pyspark.sql.Row
The new Row.