sparknlp.annotator.chunk2_doc
#
Contains classes for Chunk2Doc.
Module Contents#
Classes#
Converts a |
- class Chunk2Doc[source]#
Converts a
CHUNK
type column back intoDOCUMENT
.Useful when trying to re-tokenize or do further analysis on a
CHUNK
result.Input Annotation types
Output Annotation type
CHUNK
DOCUMENT
- Parameters:
- None
See also
Doc2Chunk
for converting DOCUMENT annotations to CHUNK
Examples
>>> import sparknlp >>> from sparknlp.base import * >>> from sparknlp.pretrained import PretrainedPipeline
Location entities are extracted and converted back into
DOCUMENT
type for further processing.>>> data = spark.createDataFrame([[1, "New York and New Jersey aren't that far apart actually."]]).toDF("id", "text")
Define pretrained pipeline that extracts Named Entities amongst other things and apply Chunk2Doc on it.
>>> pipeline = PretrainedPipeline("explain_document_dl") >>> chunkToDoc = Chunk2Doc().setInputCols("entities").setOutputCol("chunkConverted") >>> explainResult = pipeline.transform(data)
Show results.
>>> result = chunkToDoc.transform(explainResult) >>> result.selectExpr("explode(chunkConverted)").show(truncate=False) +------------------------------------------------------------------------------+ |col | +------------------------------------------------------------------------------+ |[document, 0, 7, New York, [entity -> LOC, sentence -> 0, chunk -> 0], []] | |[document, 13, 22, New Jersey, [entity -> LOC, sentence -> 0, chunk -> 1], []]| +------------------------------------------------------------------------------+