Text Cleaning

Description

The text_cleaning is a pretrained pipeline that we can use to process text with a simple pipeline that performs basic processing steps and cleans text. It performs most of the common text processing tasks on your dataframe

Download Copy S3 URI

How to use

from sparknlp.pretrained import PretrainedPipeline
pipeline = PretrainedPipeline("text_cleaning", "en")

result = pipeline.annotate("""I love johnsnowlabs!  """)

Model Information

Model Name:	text_cleaning
Type:	pipeline
Compatibility:	Spark NLP 4.0.0+
License:	Open Source
Edition:	Official
Language:	en
Size:	934.6 KB

Included Models

DocumentAssembler
TokenizerModel
NormalizerModel
StopWordsCleaner
LemmatizerModel
TokenAssembler

PREVIOUSMatch Datetime in Texts

NEXTLemmatization from BSC/projecte_aina lookups