Sentiment Analysis of Danish texts

Description

This model was imported from Hugging Face (link) and it’s been finetuned for Danish language, leveraging Danish Bert embeddings and BertForSequenceClassification for text classification purposes.

Predicted Entities

negative, positive, neutral

Download Copy S3 URI

How to use

document_assembler = DocumentAssembler() \
    .setInputCol('text') \
    .setOutputCol('document')

tokenizer = Tokenizer() \
    .setInputCols(['document']) \
    .setOutputCol('token')

sequenceClassifier = BertForSequenceClassification \
      .pretrained('bert_sequence_classifier_sentiment', 'da') \
      .setInputCols(['token', 'document']) \
      .setOutputCol('class')

pipeline = Pipeline(stages=[document_assembler, tokenizer, sequenceClassifier])

example = spark.createDataFrame([['Protester over hele landet ledet af utilfredse civilsamfund på grund af den danske regerings COVID-19 lockdown-politik er kommet ud af kontrol.']]).toDF("text")
result = pipeline.fit(example).transform(example)
val document_assembler = DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val tokenizer = Tokenizer()
    .setInputCols("document")
    .setOutputCol("token")

val tokenClassifier = BertForSequenceClassification.pretrained("bert_sequence_classifier_sentiment", "da")
      .setInputCols("document", "token")
      .setOutputCol("class")

val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, sequenceClassifier))

val example = Seq.empty["Protester over hele landet ledet af utilfredse civilsamfund  grund af den danske regerings COVID-19 lockdown-politik er kommet ud af kontrol."].toDS.toDF("text")

val result = pipeline.fit(example).transform(example)
import nlu
nlu.load("da.classify.bert.sentiment.").predict("""Protester over hele landet ledet af utilfredse civilsamfund på grund af den danske regerings COVID-19 lockdown-politik er kommet ud af kontrol.""")

Results

['negative']

Model Information

Model Name: bert_sequence_classifier_sentiment
Compatibility: Spark NLP 3.3.4+
License: Open Source
Edition: Official
Input Labels: [document, token]
Output Labels: [class]
Language: da
Size: 415.2 MB
Case sensitive: true
Max sentence length: 256

Data Source

https://huggingface.co/DaNLP/da-bert-tone-sentiment-polarity#training-data