Italian BertForSequenceClassification Cased model (from Hate-speech-CNERG)

Description

Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. dehatebert-mono-italian is a Italian model originally trained by Hate-speech-CNERG.

Predicted Entities

NON_HATE, HATE

Download Copy S3 URICopied!

How to use

documentAssembler = DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

tokenizer = Tokenizer() \
    .setInputCols("document") \
    .setOutputCol("token")

seq_classifier = BertForSequenceClassification.pretrained("bert_classifier_dehatebert_mono_italian","it") \
    .setInputCols(["document", "token"]) \
    .setOutputCol("class")

pipeline = Pipeline(stages=[documentAssembler, tokenizer, seq_classifier])

data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text")

result = pipeline.fit(data).transform(data)

Model Information

Model Name: bert_classifier_dehatebert_mono_italian
Compatibility: Spark NLP 4.1.0+
License: Open Source
Edition: Official
Input Labels: [document, token]
Output Labels: [class]
Language: it
Size: 628.4 MB
Case sensitive: true
Max sentence length: 256

References

  • https://huggingface.co/Hate-speech-CNERG/dehatebert-mono-italian
  • https://github.com/punyajoy/DE-LIMIT
  • https://arxiv.org/abs/2004.06465