Packages

package root

Definition Classes: root

package com

Definition Classes: root

package johnsnowlabs

Definition Classes: com

package nlp

Definition Classes: johnsnowlabs

package annotators

Definition Classes: nlp

package spell

Definition Classes: annotators

package context

Definition Classes: spell

class ContextSpellCheckerApproach extends AnnotatorApproach[ContextSpellCheckerModel] with HasFeatures with WeightedLevenshtein

Trains a deep-learning based Noisy Channel Model Spell Algorithm.

Trains a deep-learning based Noisy Channel Model Spell Algorithm. Correction candidates are extracted combining context information and word information.

For instantiated/pretrained models, see ContextSpellCheckerModel.

Spell Checking is a sequence to sequence mapping problem. Given an input sequence, potentially containing a certain number of errors, ContextSpellChecker will rank correction sequences according to three things:

Different correction candidates for each word — word level.
The surrounding text of each word, i.e. it’s context — sentence level.
The relative cost of different correction candidates according to the edit operations at the character level it requires — subword level.

For an in-depth explanation of the module see the article Applying Context Aware Spell Checking in Spark NLP.

For extended examples of usage, see the article Training a Contextual Spell Checker for Italian Language, the Examples and the ContextSpellCheckerTestSpec.

Example

For this example, we use the first Sherlock Holmes book as the training dataset.

import spark.implicits._
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotators.Tokenizer
import com.johnsnowlabs.nlp.annotators.spell.context.ContextSpellCheckerApproach

import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")


val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val spellChecker = new ContextSpellCheckerApproach()
  .setInputCols("token")
  .setOutputCol("corrected")
  .setWordMaxDistance(3)
  .setBatchSize(24)
  .setEpochs(8)
  .setLanguageModelClasses(1650)  // dependant on vocabulary size
  // .addVocabClass("_NAME_", names) // Extra classes for correction could be added like this

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  spellChecker
))

val path = "src/test/resources/spell/sherlockholmes.txt"
val dataset = spark.sparkContext.textFile(path)
  .toDF("text")
val pipelineModel = pipeline.fit(dataset)

Definition Classes: context
See also: NorvigSweetingApproach and SymmetricDeleteApproach for alternative approaches to spell checking

ArrayHelper

com.johnsnowlabs.nlp.annotators.spell.context.ContextSpellCheckerApproach

ArrayHelper

implicit class ArrayHelper extends AnyRef

Linear Supertypes

AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

ArrayHelper
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Instance Constructors

new ArrayHelper(array: Array[Int])

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def fixSize: Array[Int]
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()

Packages

Example

ArrayHelper

implicit class ArrayHelper extends AnyRef

Instance Constructors

Value Members

Inherited from AnyRef

Inherited from Any

Members

Packages

Example

ArrayHelper 

implicit class ArrayHelper extends AnyRef

Instance Constructors

Value Members

Inherited from AnyRef

Inherited from Any

Members

ArrayHelper