Description
Identifies whether a Turkish text contains cyberbullying or not.
Predicted Entities
Negative
, Positive
Live Demo Open in Colab Download Copy S3 URI
How to use
...
berturk_embeddings = BertEmbeddings.pretrained("bert_base_turkish_uncased", "tr") \
.setInputCols("document", "lemma") \
.setOutputCol("embeddings")
embeddingsSentence = SentenceEmbeddings() \
.setInputCols(["document", "embeddings"]) \
.setOutputCol("sentence_embeddings") \
.setPoolingStrategy("AVERAGE")
document_classifier = ClassifierDLModel.pretrained('classifierdl_berturk_cyberbullying', 'tr') \
.setInputCols(["document", "sentence_embeddings"]) \
.setOutputCol("class")
berturk_pipeline = Pipeline(stages=[document_assembler, tokenizer, normalizer, stopwords_cleaner, lemma, berturk_embeddings, embeddingsSentence, document_classifier])
light_pipeline = LightPipeline(berturk_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))
result = light_pipeline.annotate("""Gidişin olsun, dönüşün olmasın inşallah senin..""")
result["class"]
...
val berturk_embeddings = BertEmbeddings.pretrained("bert_base_turkish_uncased", "tr")
.setInputCols("document", "lemma")
.setOutputCol("embeddings")
val embeddingsSentence = SentenceEmbeddings()
.setInputCols(Array("document", "embeddings"))
.setOutputCol("sentence_embeddings")
.setPoolingStrategy("AVERAGE")
val document_classifier = ClassifierDLModel.pretrained("classifierdl_berturk_cyberbullying", "tr")
.setInputCols(Array("document", "sentence_embeddings"))
.setOutputCol("class")
val berturk_pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, normalizer, stopwords_cleaner, lemma, berturk_embeddings, embeddingsSentence, document_classifier))
val light_pipeline = LightPipeline(berturk_pipeline.fit(spark.createDataFrame([[""]]).toDF("text")))
val result = light_pipeline.annotate("Gidişin olsun, dönüşün olmasın inşallah senin..")
import nlu
nlu.load("tr.classify.cyberbullying").predict("""Gidişin olsun, dönüşün olmasın inşallah senin..""")
Results
['Negative']
Model Information
Model Name: | classifierdl_berturk_cyberbullying |
Compatibility: | Spark NLP 3.1.2+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [class] |
Language: | tr |
Data Source
Trained on a custom dataset with Turkish Bert embeddings (BERTurk).
Benchmarking
precision recall f1-score support
Negative 0.83 0.80 0.81 970
Positive 0.84 0.87 0.86 1225
accuracy 0.84 2195
macro avg 0.84 0.83 0.84 2195
weighted avg 0.84 0.84 0.84 2195