Typo Detector for Icelandic


This model was imported from Hugging Face (link) and it’s been trained on a Icelandic synthetic data to detect typos, leveraging DistilBERT embeddings and DistilBertForTokenClassification for NER purposes. It classifies typo tokens as PO.

Predicted Entities


Download Copy S3 URI

How to use

documentAssembler = DocumentAssembler()\

sentenceDetector = SentenceDetector()\

tokenizer = Tokenizer()\

tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_typo_detector", "is")\

ner_converter = NerConverter()\
      .setInputCols(["sentence", "token", "ner"])\

nlpPipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier, ner_converter])
text = """Það er miög auðvelt að draga marktækar álykanir af texta með Spark NLP."""
data = spark.createDataFrame([[text]]).toDF("text")

result = nlpPipeline.fit(data).transform(data)
val documentAssembler = DocumentAssembler()

val sentenceDetector = SentenceDetector()

val tokenizer = Tokenizer()

val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_typo_detector", "is")

val ner_converter = NerConverter()
      .setInputCols(Array("sentence", "token", "ner"))

val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer, tokenClassifier, ner_converter))

val example = Seq.empty["Það er miög auðvelt  draga marktækar álykanir af texta með Spark NLP."].toDS.toDF("text")

val result = pipeline.fit(example).transform(example)
import nlu
nlu.load("is.ner.distil_bert").predict("""Það er miög auðvelt að draga marktækar álykanir af texta með Spark NLP.""")


|chunk   |ner_label|
|miög    |PO       |
|álykanir|PO       |

Model Information

Model Name: distilbert_token_classifier_typo_detector
Compatibility: Spark NLP 3.3.4+
License: Open Source
Edition: Official
Input Labels: [sentence, token]
Output Labels: [ner]
Language: is
Size: 505.7 MB
Case sensitive: true
Max sentence length: 256


label         precision recall    f1-score  support
micro avg     0.98954   0.967603  0.978448  43800.0
macro-avg     0.98954   0.967603  0.978448  43800.0
weighted-avg  0.98954   0.967603  0.978448  43800.0