Description
Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. distilbert-base-uncased-if is a English model originally trained by Aureliano.
Predicted Entities
charge.v.17, kill.v.01, put.v.01, switch_off.v.01, ask.v.01, dig.v.01, search.v.04, repeat.v.01, wear.v.02, play.v.03, ask.v.02, wait.v.01, smash.v.02, clean.v.01, drink.v.01, inventory.v.01, climb.v.01, close.v.01, set.v.05, hit.v.03, remove.v.01, hit.v.02, sit_down.v.01, memorize.v.01, stand.v.03, write.v.07, insert.v.01, light_up.v.05, show.v.01, travel.v.01, listen.v.01, sequence.n.02, brandish.v.01, take_off.v.06, wake_up.v.02, connect.v.01, say.v.08, burn.v.01, talk.v.02, turn.v.09, smell.v.01, pull.v.04, move.v.02, shoot.v.01, press.v.01, exit.v.01, take.v.04, examine.v.02, read.v.01, follow.v.01, jump.v.01, rub.v.01, throw.v.01, answer.v.01, shake.v.01, drive.v.01, buy.v.01, eat.v.01, open.v.01, break.v.05, note.v.04, sleep.v.01, drop.v.01, blow.v.01, fill.v.01, choose.v.01, enter.v.01, pray.v.01, skid.v.04, lower.v.01, lie_down.v.01, cut.v.01, look.v.01, unlock.v.01, give.v.03, tell.v.03, unknown, switch_on.v.01, consult.v.02, raise.v.02, insert.v.02, pour.v.01, touch.v.01, push.v.01
How to use
documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")
tokenizer = Tokenizer() \
.setInputCols("document") \
.setOutputCol("token")
sequenceClassifier_loaded = DistilBertForSequenceClassification.pretrained("distilbert_sequence_classifier_distilbert_base_uncased_if","en") \
.setInputCols(["document", "token"]) \
.setOutputCol("class")
pipeline = Pipeline(stages=[documentAssembler, tokenizer,sequenceClassifier_loaded])
data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text")
result = pipeline.fit(data).transform(data)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val tokenizer = new Tokenizer()
.setInputCols(Array("document"))
.setOutputCol("token")
val sequenceClassifier_loaded = DistilBertForSequenceClassification.pretrained("distilbert_sequence_classifier_distilbert_base_uncased_if","en")
.setInputCols(Array("document", "token"))
.setOutputCol("class")
val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer,sequenceClassifier_loaded))
val data = Seq("PUT YOUR STRING HERE").toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.classify.distil_bert.uncased_base").predict("""PUT YOUR STRING HERE""")
Model Information
| Model Name: | distilbert_sequence_classifier_distilbert_base_uncased_if |
| Compatibility: | Spark NLP 5.2.0+ |
| License: | Open Source |
| Edition: | Official |
| Input Labels: | [document, token] |
| Output Labels: | [ner] |
| Language: | en |
| Size: | 249.7 MB |
| Case sensitive: | false |
| Max sentence length: | 128 |
References
References
- https://huggingface.co/Aureliano/distilbert-base-uncased-if
- https://rasa.com/docs/rasa/components#languagemodelfeaturizer
- https://github.com/aporporato/jericho-corpora