Description
The text_cleaning is a pretrained pipeline that we can use to process text with a simple pipeline that performs basic processing steps and cleans text. It performs most of the common text processing tasks on your dataframe
How to use
from sparknlp.pretrained import PretrainedPipeline
pipeline = PretrainedPipeline("text_cleaning", "en")
result = pipeline.annotate("""I love johnsnowlabs! """)
Model Information
| Model Name: | text_cleaning |
| Type: | pipeline |
| Compatibility: | Spark NLP 4.0.0+ |
| License: | Open Source |
| Edition: | Official |
| Language: | en |
| Size: | 934.6 KB |
Included Models
- DocumentAssembler
- TokenizerModel
- NormalizerModel
- StopWordsCleaner
- LemmatizerModel
- TokenAssembler