Packages

package symmetric

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. trait ReadablePretrainedSymmetric extends ParamsAndFeaturesReadable[SymmetricDeleteModel] with HasPretrained[SymmetricDeleteModel]
  2. class SymmetricDeleteApproach extends AnnotatorApproach[SymmetricDeleteModel] with SymmetricDeleteParams

    Trains a Symmetric Delete spelling correction algorithm.

    Trains a Symmetric Delete spelling correction algorithm. Retrieves tokens and utilizes distance metrics to compute possible derived words.

    The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance. It is six orders of magnitude faster (than the standard approach with deletes + transposes + replaces + inserts) and language independent. A dictionary of correct spellings must be provided with setDictionary either in the form of a text file or directly as an ExternalResource, where each word is parsed by a regex pattern.

    Inspired by SymSpell.

    For instantiated/pretrained models, see SymmetricDeleteModel.

    See SymmetricDeleteModelTestSpec for further reference.

    Example

    In this example, the dictionary "words.txt" has the form of

    ...
    gummy
    gummic
    gummier
    gummiest
    gummiferous
    ...

    This dictionary is then set to be the basis of the spell checker.

    import com.johnsnowlabs.nlp.base.DocumentAssembler
    import com.johnsnowlabs.nlp.annotators.Tokenizer
    import com.johnsnowlabs.nlp.annotators.spell.symmetric.SymmetricDeleteApproach
    import org.apache.spark.ml.Pipeline
    
    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val tokenizer = new Tokenizer()
      .setInputCols("document")
      .setOutputCol("token")
    
    val spellChecker = new SymmetricDeleteApproach()
      .setInputCols("token")
      .setOutputCol("spell")
      .setDictionary("src/test/resources/spell/words.txt")
    
    val pipeline = new Pipeline().setStages(Array(
      documentAssembler,
      tokenizer,
      spellChecker
    ))
    
    val pipelineModel = pipeline.fit(trainingData)
    See also

    NorvigSweetingApproach for an alternative approach to spell checking

    ContextSpellCheckerApproach for a DL based approach

  3. class SymmetricDeleteModel extends AnnotatorModel[SymmetricDeleteModel] with HasSimpleAnnotate[SymmetricDeleteModel] with SymmetricDeleteParams

    Symmetric Delete spelling correction algorithm.

    Symmetric Delete spelling correction algorithm.

    The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance. It is six orders of magnitude faster (than the standard approach with deletes + transposes + replaces + inserts) and language independent.

    Inspired by SymSpell.

    Pretrained models can be loaded with pretrained of the companion object:

    val spell = SymmetricDeleteModel.pretrained()
      .setInputCols("token")
      .setOutputCol("spell")

    The default model is "spellcheck_sd", if no name is provided. For available pretrained models please see the Models Hub.

    See SymmetricDeleteModelTestSpec for further reference.

    Example

    import spark.implicits._
    import com.johnsnowlabs.nlp.base.DocumentAssembler
    import com.johnsnowlabs.nlp.annotators.Tokenizer
    import com.johnsnowlabs.nlp.annotators.spell.symmetric.SymmetricDeleteModel
    import org.apache.spark.ml.Pipeline
    
    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val tokenizer = new Tokenizer()
      .setInputCols("document")
      .setOutputCol("token")
    
    val spellChecker = SymmetricDeleteModel.pretrained()
      .setInputCols("token")
      .setOutputCol("spell")
    
    val pipeline = new Pipeline().setStages(Array(
      documentAssembler,
      tokenizer,
      spellChecker
    ))
    
    val data = Seq("spmetimes i wrrite wordz erong.").toDF("text")
    val result = pipeline.fit(data).transform(data)
    result.select("spell.result").show(false)
    +--------------------------------------+
    |result                                |
    +--------------------------------------+
    |[sometimes, i, write, words, wrong, .]|
    +--------------------------------------+
    See also

    NorvigSweetingModel for an alternative approach to spell checking

    ContextSpellCheckerModel for a DL based approach

  4. trait SymmetricDeleteParams extends Params

Value Members

  1. object SymmetricDeleteApproach extends DefaultParamsReadable[SymmetricDeleteApproach] with Serializable

    This is the companion object of SymmetricDeleteApproach.

    This is the companion object of SymmetricDeleteApproach. Please refer to that class for the documentation.

  2. object SymmetricDeleteModel extends ReadablePretrainedSymmetric with Serializable

    This is the companion object of SymmetricDeleteModel.

    This is the companion object of SymmetricDeleteModel. Please refer to that class for the documentation.

Ungrouped