Description
The text_cleaning is a pretrained pipeline that we can use to process text with a simple pipeline that performs basic processing steps and cleans text. It performs most of the common text processing tasks on your dataframe
Predicted Entities
How to use
from sparknlp.pretrained import PretrainedPipeline
pipeline = PretrainedPipeline("text_cleaning", "en")
result = pipeline.annotate("""I love johnsnowlabs! """)
Model Information
Model Name: | text_cleaning |
Type: | pipeline |
Compatibility: | Spark NLP 4.4.2+ |
License: | Open Source |
Edition: | Official |
Language: | en |
Size: | 944.5 KB |
Included Models
- DocumentAssembler
- TokenizerModel
- NormalizerModel
- StopWordsCleaner
- LemmatizerModel
- TokenAssembler