Description
Analyze sentiment in reviews by classifying them as positive
and negative
. When the sentiment probability is below a customizable threshold (default to 0.6
) then resulting document will be labeled as neutral
. This model is trained using the multilingual UniversalSentenceEncoder
sentence embeddings, and uses DL approach to classify the sentiments.
Predicted Entities
positive
, negative
, neutral
Open in Colab Download Copy S3 URI
How to use
Use in the pipeline with the pretrained multi-language UniversalSentenceEncoder
annotator tfhub_use_multi_lg
.
document_assembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")
use = UniversalSentenceEncoder.pretrained("tfhub_use_multi_lg", "xx") \
.setInputCols(["document"])\
.setOutputCol("sentence_embeddings")
sentimentdl = SentimentDLModel.pretrained("sentiment_jager_use", "th")\
.setInputCols(["sentence_embeddings"])\
.setOutputCol("sentiment")
pipeline = Pipeline(stages = [document_assembler, use, sentimentdl])
example = spark.createDataFrame([['เเพ้ตอนnctโผล่มาตลอดเลยค่ะเเอด5555555']], ["text"])
result = pipeline.fit(example).transform(example)
val document_assembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val use = UniversalSentenceEncoder.pretrained("tfhub_use_multi_lg", "xx")
.setInputCols(Array("document")
.setOutputCol("sentence_embeddings")
val sentimentdl = SentimentDLModel.pretrained("sentiment_jager_use", "th")
.setInputCols(Array("sentence_embeddings"))
.setOutputCol("sentiment")
val pipeline = new Pipeline().setStages(Array(document_assembler, use, sentimentdl))
val data = Seq("เเพ้ตอนnctโผล่มาตลอดเลยค่ะเเอด5555555").toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
text = ["""เเพ้ตอนnctโผล่มาตลอดเลยค่ะเเอด5555555"""]
sentiment_df = nlu.load('th.classify.sentiment').predict(text)
sentiment_df
Results
+-------------------------------------+----------+
|text |result |
+-------------------------------------+----------+
|เเพ้ตอนnctโผล่มาตลอดเลยค่ะเเอด5555555 |[positive] |
+-------------------------------------+----------+
Model Information
Model Name: | sentiment_jager_use |
Compatibility: | Spark NLP 2.7.1+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [sentiment] |
Language: | th |
Data Source
The model was trained on the custom corpus from Jager V3.
Benchmarking
| sentiment | precision | recall | f1-score | support |
|--------------|-----------|--------|----------|---------|
| negative | 0.94 | 0.99 | 0.96 | 82 |
| positive | 0.97 | 0.87 | 0.92 | 38 |
| accuracy | | | 0.95 | 120 |
| macro avg | 0.96 | 0.93 | 0.94 | 120 |
| weighted avg | 0.95 | 0.95 | 0.95 | 120 |