Description
Spell Checker is a sequence-to-sequence model that detects and corrects spelling errors in your input text. It’s based on Levenshtein Automaton for generating candidate corrections and a Neural Language Model for ranking corrections.
Predicted Entities
Live Demo Open in Colab Download Copy S3 URI
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = RecursiveTokenizer()\
.setInputCols(["document"])\
.setOutputCol("token")\
.setPrefixes(["\"", "“", "(", "[", "\n", "."]) \
.setSuffixes(["\"", "”", ".", ",", "?", ")", "]", "!", ";", ":", "'s", "’s"])
spellModel = ContextSpellCheckerModel\
.pretrained("spellcheck_dl", "en")\
.setInputCols("token")\
.setOutputCol("checked")\
pipeline = Pipeline(stages = [documentAssembler, tokenizer, spellModel])
empty_df = spark.createDataFrame([[""]]).toDF("text")
lp = LightPipeline(pipeline.fit(empty_df))
text = ["During the summer we have the best ueather.", "I have a black ueather jacket, so nice."]
lp.annotate(text)
val assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val tokenizer = new RecursiveTokenizer()
.setInputCols(Array("document"))
.setOutputCol("token")
.setPrefixes(Array("\"", "“", "(", "[", "\n", "."))
.setSuffixes(Array("\"", "”", ".", ",", "?", ")", "]", "!", ";", ":", "'s", "’s"))
val spellChecker = ContextSpellCheckerModel.
pretrained("spellcheck_dl", "en").
setInputCols("token").
setOutputCol("checked")
val pipeline = new Pipeline().setStages(Array(assembler, tokenizer, spellChecker))
val empty_df = spark.createDataFrame([[""]]).toDF("text")
val lp = new LightPipeline(pipeline.fit(empty_df))
val text = Array("During the summer we have the best ueather.", "I have a black ueather jacket, so nice.")
lp.annotate(text)
import nlu
nlu.load("spell").predict("""During the summer we have the best ueather.""")
Results
[{'checked': ['During', 'the', 'summer', 'we', 'have', 'the', 'best', 'weather', '.'],
'document': ['During the summer we have the best ueather.'],
'token': ['During', 'the', 'summer', 'we', 'have', 'the', 'best', 'ueather', '.']},
{'checked': ['I', 'have', 'a', 'black', 'leather', 'jacket', ',', 'so', 'nice', '.'],
'document': ['I have a black ueather jacket, so nice.'],
'token': ['I', 'have', 'a', 'black', 'ueather', 'jacket', ',', 'so', 'nice', '.']}]
Model Information
Model Name: | spellcheck_dl |
Compatibility: | Spark NLP 3.4.1+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [token] |
Output Labels: | [corrected] |
Language: | en |
Size: | 99.7 MB |
References
Combination of custom data sets.