Sentiment Analysis of IMDB Reviews (sentimentdl_use_imdb)

Description

Classify IMDB reviews in negative and positive categories using Universal Sentence Encoder.

Predicted Entities

neg, pos

Live Demo Open in Colab Download Copy S3 URI

How to use

document_assembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

use = UniversalSentenceEncoder.pretrained('tfhub_use', lang="en") \
.setInputCols(["document"])\
.setOutputCol("sentence_embeddings")

classifier = SentimentDLModel().pretrained('sentimentdl_use_imdb')\
.setInputCols(["sentence_embeddings"])\
.setOutputCol("sentiment")

nlp_pipeline = Pipeline(stages=[document_assembler,
use,
classifier
])

l_model = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

annotations = l_model.fullAnnotate('Demonicus is a movie turned into a video game! I just love the story and the things that goes on in the film.It is a B-film ofcourse but that doesn`t bother one bit because its made just right and the music was rad! Horror and sword fight freaks,buy this movie now!')

import nlu
nlu.load("en.sentiment.imdb.use.dl").predict("""Demonicus is a movie turned into a video game! I just love the story and the things that goes on in the film.It is a B-film ofcourse but that doesn`t bother one bit because its made just right and the music was rad! Horror and sword fight freaks,buy this movie now!""")

Results

|    | document                                                                                                 | sentiment     |
|---:|---------------------------------------------------------------------------------------------------------:|--------------:|
|    | Demonicus is a movie turned into a video game! I just love the story and the things that goes on in the  |               |
|  0 | film.It is a B-film ofcourse but that doesn`t bother one bit because its made just right and the music   | positive      |
|    | was rad! Horror and sword fight freaks,buy this movie now!                                               |               |

Model Information

Model Name:	sentimentdl_use_imdb
Compatibility:	Spark NLP 2.7.0+
License:	Open Source
Edition:	Official
Input Labels:	[sentence_embeddings]
Output Labels:	[sentiment]
Language:	en
Dependencies:	tfhub_use

Data Source

This model is trained on data from https://ai.stanford.edu/~amaas/data/sentiment/

Benchmarking

precision    recall  f1-score   support

neg       0.88      0.82      0.85     12500
pos       0.84      0.88      0.86     12500

accuracy                           0.85     25000
macro avg       0.86      0.86      0.85     25000
weighted avg       0.86      0.85      0.85     25000

PREVIOUSSentiment Analysis of IMDB Reviews Pipeline (analyze_sentimentdl_use_imdb)

NEXTJapanese Lemmatizer