norvig

package norvig

Ordering

Alphabetic

Visibility

Public
All

Type Members

class NorvigSweetingApproach extends AnnotatorApproach[NorvigSweetingModel] with NorvigSweetingParams
Trains annotator, that retrieves tokens and makes corrections automatically if not found in an English dictionary, based on the algorithm by Peter Norvig.
Trains annotator, that retrieves tokens and makes corrections automatically if not found in an English dictionary, based on the algorithm by Peter Norvig.
The algorithm is based on a Bayesian approach to spell checking: Given the word we look in the provided dictionary to choose the word with the highest probability to be the correct one.
A dictionary of correct spellings must be provided with setDictionary either in the form of a text file or directly as an ExternalResource, where each word is parsed by a regex pattern.
Inspired by the spell checker by Peter Norvig: How to Write a Spelling Corrector.
For instantiated/pretrained models, see NorvigSweetingModel.
For extended examples of usage, see the NorvigSweetingTestSpec.
Example
In this example, the dictionary "words.txt" has the form of
```
...
gummy
gummic
gummier
gummiest
gummiferous
...
```
This dictionary is then set to be the basis of the spell checker.
```
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotators.Tokenizer
import com.johnsnowlabs.nlp.annotators.spell.norvig.NorvigSweetingApproach
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val spellChecker = new NorvigSweetingApproach()
  .setInputCols("token")
  .setOutputCol("spell")
  .setDictionary("src/test/resources/spell/words.txt")

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  spellChecker
))

val pipelineModel = pipeline.fit(trainingData)
```
See also
SymmetricDeleteApproach for an alternative approach to spell checking
ContextSpellCheckerApproach for a DL based approach
class NorvigSweetingModel extends AnnotatorModel[NorvigSweetingModel] with HasSimpleAnnotate[NorvigSweetingModel] with NorvigSweetingParams
This annotator retrieves tokens and makes corrections automatically if not found in an English dictionary.
This annotator retrieves tokens and makes corrections automatically if not found in an English dictionary. Inspired by Norvig model and SymSpell.
The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance. It is six orders of magnitude faster (than the standard approach with deletes + transposes + replaces + inserts) and language independent.
This is the instantiated model of the NorvigSweetingApproach. For training your own model, please see the documentation of that class.
Pretrained models can be loaded with pretrained of the companion object:
```
val spellChecker = NorvigSweetingModel.pretrained()
  .setInputCols("token")
  .setOutputCol("spell")
  .setDoubleVariants(true)
```
The default model is "spellcheck_norvig", if no name is provided. For available pretrained models please see the Models Hub.
For extended examples of usage, see the NorvigSweetingTestSpec.
Example
```
import spark.implicits._
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotators.Tokenizer
import com.johnsnowlabs.nlp.annotators.spell.norvig.NorvigSweetingModel

import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val spellChecker = NorvigSweetingModel.pretrained()
  .setInputCols("token")
  .setOutputCol("spell")

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  spellChecker
))

val data = Seq("somtimes i wrrite wordz erong.").toDF("text")
val result = pipeline.fit(data).transform(data)
result.select("spell.result").show(false)
+--------------------------------------+
|result                                |
+--------------------------------------+
|[sometimes, i, write, words, wrong, .]|
+--------------------------------------+
```
See also
SymmetricDeleteModel for an alternative approach to spell checking
ContextSpellCheckerModel for a DL based approach
trait NorvigSweetingParams extends Params
These are the configs for the NorvigSweeting model
These are the configs for the NorvigSweeting model
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/spell/norvig/NorvigSweetingTestSpec.scala for further reference on how to use this API
trait ReadablePretrainedNorvig extends ParamsAndFeaturesReadable[NorvigSweetingModel] with HasPretrained[NorvigSweetingModel]

Value Members

object NorvigSweetingApproach extends DefaultParamsReadable[NorvigSweetingApproach] with Serializable
This is the companion object of NorvigSweetingApproach.
This is the companion object of NorvigSweetingApproach. Please refer to that class for the documentation.
object NorvigSweetingModel extends ReadablePretrainedNorvig with Serializable
This is the companion object of NorvigSweetingModel.
This is the companion object of NorvigSweetingModel. Please refer to that class for the documentation.

Packages

norvig

package norvig

Type Members

Example

Example

Value Members

Ungrouped

Packages

norvig 

package norvig

Type Members

Example

Example

Value Members

Ungrouped

norvig