Description
Classify Turkish news texts
Predicted Entities
kultur
, saglik
, ekonomi
, teknoloji
, siyaset
, spor
How to use
document = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
embeddings = BertSentenceEmbeddings\
.pretrained('labse', 'xx') \
.setInputCols(["document"])\
.setOutputCol("sentence_embeddings")
document_classifier = ClassifierDLModel.pretrained("classifierdl_bert_news", "tr") \
.setInputCols(["document", "sentence_embeddings"]) \
.setOutputCol("class")
nlpPipeline = Pipeline(stages=[document, embeddings, document_classifier])
light_pipeline = LightPipeline(nlpPipeline.fit(spark.createDataFrame([['']]).toDF("text")))
result = light_pipeline.annotate('Bonservisi elinde olan Milli oyuncu, yeni takımıyla el sıkıştı.')
val document = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val embeddings = BertSentenceEmbeddings
.pretrained("labse", "xx")
.setInputCols("document")
.setOutputCol("sentence_embeddings")
val document_classifier = ClassifierDLModel.pretrained("classifierdl_bert_news", "tr")
.setInputCols(Array("document", "sentence_embeddings"))
.setOutputCol("class")
val nlpPipeline = new Pipeline().setStages(Array(document, embeddings, document_classifier))
val light_pipeline = LightPipeline(nlpPipeline.fit(spark.createDataFrame([['']]).toDF("text")))
val result = light_pipeline.annotate("Bonservisi elinde olan Milli oyuncu, yeni takımıyla el sıkıştı".)
import nlu
nlu.load("tr.classify.news").predict("""Bonservisi elinde olan Milli oyuncu, yeni takımıyla el sıkıştı.""")
Results
["spor"]
Model Information
Model Name: | classifierdl_bert_news |
Compatibility: | Spark NLP 3.0.2+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [class] |
Language: | tr |
Dependencies: | labse_BERT |
Data Source
Trained on a custom dataset with multi-lingual Bert Embeddings labse
.
Benchmarking
precision recall f1-score support
ekonomi 0.88 0.86 0.87 263
kultur 0.93 0.96 0.94 277
saglik 0.95 0.96 0.95 273
siyaset 0.89 0.91 0.90 257
spor 0.97 0.97 0.97 279
teknoloji 0.94 0.88 0.91 250
accuracy 0.93 1599
macro avg 0.93 0.92 0.93 1599
weighted avg 0.93 0.93 0.93 1599