Description
This a bert-base-multilingual-uncased model finetuned for sentiment analysis on product reviews in six languages: English, Dutch, German, French, Spanish and Italian. It predicts the sentiment of the review as a number of stars (between 1 and 5).
This model is intended for direct use as a sentiment analysis model for product reviews in any of the six languages above, or for further finetuning on related sentiment analysis tasks.
Predicted Entities
1 star
, 2 stars
, 3 stars
, 4 stars
, 5 stars
How to use
document_assembler = DocumentAssembler() \
.setInputCol('text') \
.setOutputCol('document')
tokenizer = Tokenizer() \
.setInputCols(['document']) \
.setOutputCol('token')
sequenceClassifier = BertForSequenceClassification \
.pretrained('bert_sequence_classifier_multilingual_sentiment', 'xx') \
.setInputCols(['token', 'document']) \
.setOutputCol('class') \
.setCaseSensitive(False) \
.setMaxSentenceLength(512)
pipeline = Pipeline(stages=[
document_assembler,
tokenizer,
sequenceClassifier
])
example = spark.createDataFrame([['I really liked that movie!']]).toDF("text")
result = pipeline.fit(example).transform(example)
val document_assembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val tokenizer = Tokenizer()
.setInputCols("document")
.setOutputCol("token")
val tokenClassifier = BertForSequenceClassification.pretrained("bert_sequence_classifier_multilingual_sentiment", "xx")
.setInputCols("document", "token")
.setOutputCol("class")
.setCaseSensitive(false)
.setMaxSentenceLength(512)
val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, sequenceClassifier))
val example = Seq("I really liked that movie!").toDS.toDF("text")
val result = pipeline.fit(example).transform(example)
import nlu
nlu.load("xx.classify.bert.sentiment.multilingual").predict("""I really liked that movie!""")
Model Information
Model Name: | bert_sequence_classifier_multilingual_sentiment |
Compatibility: | Spark NLP 5.5.1+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [document, token] |
Output Labels: | [class] |
Language: | xx |
Size: | 627.8 MB |
Benchmarking
The finetuned model obtained the following accuracy on 5,000 held-out product reviews in each of the languages:
- Accuracy (exact) is the exact match on the number of stars.
- Accuracy (off-by-1) is the percentage of reviews where the number of stars the model predicts differs by a maximum of 1 from the number given by the human reviewer.
| Language | Accuracy (exact) | Accuracy (off-by-1) |
| -------- | ---------------------- | ------------------- |
| English | 67% | 95%
| Dutch | 57% | 93%
| German | 61% | 94%
| French | 59% | 94%
| Italian | 59% | 95%
| Spanish | 58% | 95%