Description
Analyze sentiment in reviews by classifying them as positive and negative. When the sentiment probability is below a customizable threshold (default to 0.6) then resulting document will be labeled as neutral. This model is trained using the multilingual UniversalSentenceEncoder sentence embeddings, and uses DL approach to classify the sentiments.
Predicted Entities
positive, negative, neutral
Open in Colab Download Copy S3 URI
How to use
Use in the pipeline with the pretrained multi-language UniversalSentenceEncoder annotator tfhub_use_multi_lg.
document_assembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")
use = UniversalSentenceEncoder.pretrained("tfhub_use_multi_lg", "xx") \
.setInputCols(["document"])\
.setOutputCol("sentence_embeddings")
sentimentdl = SentimentDLModel.pretrained("sentiment_jager_use", "th")\
.setInputCols(["sentence_embeddings"])\
.setOutputCol("sentiment")
pipeline = Pipeline(stages = [document_assembler, use, sentimentdl])
example = spark.createDataFrame([['เเพ้ตอนnctโผล่มาตลอดเลยค่ะเเอด5555555']], ["text"])
result = pipeline.fit(example).transform(example)
val document_assembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val use = UniversalSentenceEncoder.pretrained("tfhub_use_multi_lg", "xx")
.setInputCols(Array("document")
.setOutputCol("sentence_embeddings")
val sentimentdl = SentimentDLModel.pretrained("sentiment_jager_use", "th")
.setInputCols(Array("sentence_embeddings"))
.setOutputCol("sentiment")
val pipeline = new Pipeline().setStages(Array(document_assembler, use, sentimentdl))
val data = Seq("เเพ้ตอนnctโผล่มาตลอดเลยค่ะเเอด5555555").toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
text = ["""เเพ้ตอนnctโผล่มาตลอดเลยค่ะเเอด5555555"""]
sentiment_df = nlu.load('th.classify.sentiment').predict(text)
sentiment_df
Results
+-------------------------------------+----------+
|text |result |
+-------------------------------------+----------+
|เเพ้ตอนnctโผล่มาตลอดเลยค่ะเเอด5555555 |[positive] |
+-------------------------------------+----------+
Model Information
| Model Name: | sentiment_jager_use |
| Compatibility: | Spark NLP 2.7.1+ |
| License: | Open Source |
| Edition: | Official |
| Input Labels: | [sentence_embeddings] |
| Output Labels: | [sentiment] |
| Language: | th |
Data Source
The model was trained on the custom corpus from Jager V3.
Benchmarking
| sentiment | precision | recall | f1-score | support |
|--------------|-----------|--------|----------|---------|
| negative | 0.94 | 0.99 | 0.96 | 82 |
| positive | 0.97 | 0.87 | 0.92 | 38 |
| accuracy | | | 0.95 | 120 |
| macro avg | 0.96 | 0.93 | 0.94 | 120 |
| weighted avg | 0.95 | 0.95 | 0.95 | 120 |