Spell Checker for the English Language (Norvig)

Description

Detects and corrects spelling errors in your input text. It’s based on the NorvigSweeting approach.

Download Copy S3 URI

How to use

import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline


documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

tokenizer = Tokenizer() \
.setInputCols(["document"]) \
.setOutputCol("token")

spellChecker = NorvigSweetingModel.pretrained() \
.setInputCols(["token"]) \
.setOutputCol("spell")

pipeline = Pipeline().setStages([
documentAssembler,
tokenizer,
spellChecker
])

data = spark.createDataFrame([["somtimes i wrrite wordz erong."]]).toDF("text")
result = pipeline.fit(data).transform(data)
result.select("spell.result").show(truncate=False)
import spark.implicits._
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotators.Tokenizer
import com.johnsnowlabs.nlp.annotators.spell.norvig.NorvigSweetingModel

import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val tokenizer = new Tokenizer()
.setInputCols("document")
.setOutputCol("token")

val spellChecker = NorvigSweetingModel.pretrained()
.setInputCols("token")
.setOutputCol("spell")

val pipeline = new Pipeline().setStages(Array(
documentAssembler,
tokenizer,
spellChecker
))

val data = Seq("somtimes i wrrite wordz erong.").toDF("text")
val result = pipeline.fit(data).transform(data)
result.select("spell.result").show(false)
import nlu
nlu.load("en.spell.norvig").predict("""somtimes i wrrite wordz erong.""")

Results

+--------------------------------------+
|result                                |
+--------------------------------------+
|[sometimes, i, write, words, wrong, .]|
+--------------------------------------+

Model Information

Model Name: spellcheck_norvig
Compatibility: Spark NLP 2.0.2+
License: Open Source
Edition: Official
Input Labels: [token]
Output Labels: [checked]
Language: en