sparknlp.annotator.sentiment.vivekn_sentiment
#
Contains classes for ViveknSentiment.
Module Contents#
Classes#
Trains a sentiment analyser inspired by the algorithm by Vivek Narayanan. |
|
Sentiment analyser inspired by the algorithm by Vivek Narayanan. |
- class ViveknSentimentApproach[source]#
Trains a sentiment analyser inspired by the algorithm by Vivek Narayanan.
The analyzer requires sentence boundaries to give a score in context. Tokenization is needed to make sure tokens are within bounds. Transitivity requirements are also required.
The training data needs to consist of a column for normalized text and a label column (either
"positive"
or"negative"
).For extended examples of usage, see the Examples.
Input Annotation types
Output Annotation type
TOKEN, DOCUMENT
SENTIMENT
- Parameters:
- sentimentCol
column with the sentiment result of every row. Must be ‘positive’ or ‘negative’
- pruneCorpus
Removes unfrequent scenarios from scope. The higher the better performance. Defaults 1
References
The algorithm is based on the paper “Fast and accurate sentiment classification using an enhanced Naive Bayes model”.
Examples
>>> import sparknlp >>> from sparknlp.base import * >>> from sparknlp.annotator import * >>> from pyspark.ml import Pipeline >>> document = DocumentAssembler() \ ... .setInputCol("text") \ ... .setOutputCol("document") >>> token = Tokenizer() \ ... .setInputCols(["document"]) \ ... .setOutputCol("token") >>> normalizer = Normalizer() \ ... .setInputCols(["token"]) \ ... .setOutputCol("normal") >>> vivekn = ViveknSentimentApproach() \ ... .setInputCols(["document", "normal"]) \ ... .setSentimentCol("train_sentiment") \ ... .setOutputCol("result_sentiment") >>> finisher = Finisher() \ ... .setInputCols(["result_sentiment"]) \ ... .setOutputCols("final_sentiment") >>> pipeline = Pipeline().setStages([document, token, normalizer, vivekn, finisher]) >>> training = spark.createDataFrame([ ... ("I really liked this movie!", "positive"), ... ("The cast was horrible", "negative"), ... ("Never going to watch this again or recommend it to anyone", "negative"), ... ("It's a waste of time", "negative"), ... ("I loved the protagonist", "positive"), ... ("The music was really really good", "positive") ... ]).toDF("text", "train_sentiment") >>> pipelineModel = pipeline.fit(training) >>> data = spark.createDataFrame([ ... ["I recommend this movie"], ... ["Dont waste your time!!!"] ... ]).toDF("text") >>> result = pipelineModel.transform(data) >>> result.select("final_sentiment").show(truncate=False) +---------------+ |final_sentiment| +---------------+ |[positive] | |[negative] | +---------------+
- class ViveknSentimentModel(classname='com.johnsnowlabs.nlp.annotators.sda.vivekn.ViveknSentimentModel', java_model=None)[source]#
Sentiment analyser inspired by the algorithm by Vivek Narayanan.
This is the instantiated model of the
ViveknSentimentApproach
. For training your own model, please see the documentation of that class.The analyzer requires sentence boundaries to give a score in context. Tokenization is needed to make sure tokens are within bounds. Transitivity requirements are also required.
For extended examples of usage, see the Examples.
Input Annotation types
Output Annotation type
TOKEN, DOCUMENT
SENTIMENT
- Parameters:
- None
References
The algorithm is based on the paper “Fast and accurate sentiment classification using an enhanced Naive Bayes model”.
- static pretrained(name='sentiment_vivekn', lang='en', remote_loc=None)[source]#
Downloads and loads a pretrained model.
- Parameters:
- namestr, optional
Name of the pretrained model, by default “sentiment_vivekn”
- langstr, optional
Language of the pretrained model, by default “en”
- remote_locstr, optional
Optional remote address of the resource, by default None. Will use Spark NLPs repositories otherwise.
- Returns:
- ViveknSentimentModel
The restored model