Packages

  • package root
    Definition Classes
    root
  • package com
    Definition Classes
    root
  • package johnsnowlabs
    Definition Classes
    com
  • package nlp
    Definition Classes
    johnsnowlabs
  • package annotators
    Definition Classes
    nlp
  • package spell
    Definition Classes
    annotators
  • package context
    Definition Classes
    spell
  • class ContextSpellCheckerApproach extends AnnotatorApproach[ContextSpellCheckerModel] with HasFeatures with WeightedLevenshtein

    Trains a deep-learning based Noisy Channel Model Spell Algorithm.

    Trains a deep-learning based Noisy Channel Model Spell Algorithm. Correction candidates are extracted combining context information and word information.

    For instantiated/pretrained models, see ContextSpellCheckerModel.

    Spell Checking is a sequence to sequence mapping problem. Given an input sequence, potentially containing a certain number of errors, ContextSpellChecker will rank correction sequences according to three things:

    1. Different correction candidates for each word — word level.
    2. The surrounding text of each word, i.e. it’s context — sentence level.
    3. The relative cost of different correction candidates according to the edit operations at the character level it requires — subword level.

    For an in-depth explanation of the module see the article Applying Context Aware Spell Checking in Spark NLP.

    For extended examples of usage, see the article Training a Contextual Spell Checker for Italian Language, the Examples and the ContextSpellCheckerTestSpec.

    Example

    For this example, we use the first Sherlock Holmes book as the training dataset.

    import spark.implicits._
    import com.johnsnowlabs.nlp.base.DocumentAssembler
    import com.johnsnowlabs.nlp.annotators.Tokenizer
    import com.johnsnowlabs.nlp.annotators.spell.context.ContextSpellCheckerApproach
    
    import org.apache.spark.ml.Pipeline
    
    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    
    val tokenizer = new Tokenizer()
      .setInputCols("document")
      .setOutputCol("token")
    
    val spellChecker = new ContextSpellCheckerApproach()
      .setInputCols("token")
      .setOutputCol("corrected")
      .setWordMaxDistance(3)
      .setBatchSize(24)
      .setEpochs(8)
      .setLanguageModelClasses(1650)  // dependant on vocabulary size
      // .addVocabClass("_NAME_", names) // Extra classes for correction could be added like this
    
    val pipeline = new Pipeline().setStages(Array(
      documentAssembler,
      tokenizer,
      spellChecker
    ))
    
    val path = "src/test/resources/spell/sherlockholmes.txt"
    val dataset = spark.sparkContext.textFile(path)
      .toDF("text")
    val pipelineModel = pipeline.fit(dataset)
    Definition Classes
    context
    See also

    NorvigSweetingApproach and SymmetricDeleteApproach for alternative approaches to spell checking

  • ArrayHelper

implicit class ArrayHelper extends AnyRef

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. ArrayHelper
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new ArrayHelper(array: Array[Int])

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. def fixSize: Array[Int]
  10. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  11. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  12. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  13. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  14. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  15. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  16. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  17. def toString(): String
    Definition Classes
    AnyRef → Any
  18. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  19. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  20. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from AnyRef

Inherited from Any

Members