Text Cleaning

Description

The text_cleaning is a pretrained pipeline that we can use to process text with a simple pipeline that performs basic processing steps and cleans text. It performs most of the common text processing tasks on your dataframe

Predicted Entities

Download Copy S3 URI

How to use

from sparknlp.pretrained import PretrainedPipeline
pipeline = PretrainedPipeline("text_cleaning", "en")

result = pipeline.annotate("""I love johnsnowlabs!  """)

Model Information

Model Name: text_cleaning
Type: pipeline
Compatibility: Spark NLP 4.4.2+
License: Open Source
Edition: Official
Language: en
Size: 944.5 KB

Included Models

  • DocumentAssembler
  • TokenizerModel
  • NormalizerModel
  • StopWordsCleaner
  • LemmatizerModel
  • TokenAssembler