Sentiment Analysis of Tweets (sentimentdl_use_twitter)

Description

Classify sentiment in tweets as negative or positive using Universal Sentence Encoder embeddings.

Predicted Entities

positive, negative

Live Demo Open in Colab Download Copy S3 URI

How to use

document_assembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

use = UniversalSentenceEncoder.pretrained('tfhub_use', lang="en") \
.setInputCols(["document"])\
.setOutputCol("sentence_embeddings")

classifier = SentimentDLModel().pretrained('sentimentdl_use_twitter')\
.setInputCols(["sentence_embeddings"])\
.setOutputCol("sentiment")

nlp_pipeline = Pipeline(stages=[document_assembler,
use,
classifier
])

l_model = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

annotations = l_model.fullAnnotate(["im meeting up with one of my besties tonight! Cant wait!!  - GIRL TALK!!", "is upset that he can't update his Facebook by texting it... and might cry as a result  School today also. Blah!"])

import nlu
nlu.load("en.sentiment.twitter.dl").predict("""is upset that he can't update his Facebook by texting it... and might cry as a result  School today also. Blah!""")

Results

|    | document                                                                                                         | sentiment   |
|---:|:---------------------------------------------------------------------------------------------------------------- |:------------|
|  0 | im meeting up with one of my besties tonight! Cant wait!!  - GIRL TALK!!                                         | positive    |
|  1 | is upset that he can't update his Facebook by texting it... and might cry as a result  School today also. Blah!  | negative    |

Model Information

Model Name: sentimentdl_use_twitter
Compatibility: Spark NLP 2.7.1+
License: Open Source
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [sentiment]
Language: en
Dependencies: tfhub_use

Data Source

Trained on Sentiment140 dataset comprising of 1.6M tweets. https://www.kaggle.com/kazanova/sentiment140

Benchmarking

loss: 7930.071 - acc: 0.80694044 - val_acc: 80.00508 - batches: 16000