sparknlp.annotator.chunk2_doc#
Contains classes for Chunk2Doc.
Module Contents#
Classes#
| Converts a  | 
- class Chunk2Doc[source]#
- Converts a - CHUNKtype column back into- DOCUMENT.- Useful when trying to re-tokenize or do further analysis on a - CHUNKresult.- Input Annotation types - Output Annotation type - CHUNK- DOCUMENT- Parameters:
- None
 
 - See also - Doc2Chunk
- for converting DOCUMENT annotations to CHUNK 
 - Examples - >>> import sparknlp >>> from sparknlp.base import * >>> from sparknlp.pretrained import PretrainedPipeline - Location entities are extracted and converted back into - DOCUMENTtype for further processing.- >>> data = spark.createDataFrame([[1, "New York and New Jersey aren't that far apart actually."]]).toDF("id", "text") - Define pretrained pipeline that extracts Named Entities amongst other things and apply Chunk2Doc on it. - >>> pipeline = PretrainedPipeline("explain_document_dl") >>> chunkToDoc = Chunk2Doc().setInputCols("entities").setOutputCol("chunkConverted") >>> explainResult = pipeline.transform(data) - Show results. - >>> result = chunkToDoc.transform(explainResult) >>> result.selectExpr("explode(chunkConverted)").show(truncate=False) +------------------------------------------------------------------------------+ |col | +------------------------------------------------------------------------------+ |[document, 0, 7, New York, [entity -> LOC, sentence -> 0, chunk -> 0], []] | |[document, 13, 22, New Jersey, [entity -> LOC, sentence -> 0, chunk -> 1], []]| +------------------------------------------------------------------------------+