sparknlp.annotator.chunk2_doc#
Contains classes for Chunk2Doc.
Module Contents#
Classes#
Converts a |
- class Chunk2Doc[source]#
Converts a
CHUNKtype column back intoDOCUMENT.Useful when trying to re-tokenize or do further analysis on a
CHUNKresult.Input Annotation types
Output Annotation type
CHUNKDOCUMENT- Parameters:
- None
See also
Doc2Chunkfor converting DOCUMENT annotations to CHUNK
Examples
>>> import sparknlp >>> from sparknlp.base import * >>> from sparknlp.pretrained import PretrainedPipeline
Location entities are extracted and converted back into
DOCUMENTtype for further processing.>>> data = spark.createDataFrame([[1, "New York and New Jersey aren't that far apart actually."]]).toDF("id", "text")
Define pretrained pipeline that extracts Named Entities amongst other things and apply Chunk2Doc on it.
>>> pipeline = PretrainedPipeline("explain_document_dl") >>> chunkToDoc = Chunk2Doc().setInputCols("entities").setOutputCol("chunkConverted") >>> explainResult = pipeline.transform(data)
Show results.
>>> result = chunkToDoc.transform(explainResult) >>> result.selectExpr("explode(chunkConverted)").show(truncate=False) +------------------------------------------------------------------------------+ |col | +------------------------------------------------------------------------------+ |[document, 0, 7, New York, [entity -> LOC, sentence -> 0, chunk -> 0], []] | |[document, 13, 22, New Jersey, [entity -> LOC, sentence -> 0, chunk -> 1], []]| +------------------------------------------------------------------------------+