Annotation#
The basic result of a Spark NLP operation is an annotation. It’s structure includes:
annotatorType
: the type of annotator that generated the current annotationbegin
: the begin of the matched content relative to raw-textend
: the end of the matched content relative to raw-textresult
: the main output of the annotationmetadata
: content of matched result and additional informationembeddings
: (new in 2.0) contains vector mappings if required
This object is automatically generated by annotators after a transform process. No manual work is required. However, it is important to clearly understand the structure of an annotation to be able too efficiently use it.
For example, the annotation could look like this (using Pretrained Pipelines):
>>> from sparknlp.pretrained import PretrainedPipeline
>>> explain_document_pipeline = PretrainedPipeline("explain_document_ml")
explain_document_ml download started this may take some time.
Approx size to download 9.1 MB
[OK!]
>>> data = spark.createDataFrame([["We are very happy about Spark NLP"]]).toDF("text")
>>> result = explain_document_pipeline.model.transform(data).selectExpr("explode(pos)")
>>> result.show(truncate=False)
+---------------------------------------+
|col |
+---------------------------------------+
|[pos, 0, 1, PRP, [word -> We], []] |
|[pos, 3, 5, VBP, [word -> are], []] |
|[pos, 7, 10, RB, [word -> very], []] |
|[pos, 12, 16, JJ, [word -> happy], []] |
|[pos, 18, 22, IN, [word -> about], []] |
|[pos, 24, 28, NNP, [word -> Spark], []]|
|[pos, 30, 32, NNP, [word -> NLP], []] |
+---------------------------------------+