sparknlp.annotator.sentiment.vivekn_sentiment#
Contains classes for ViveknSentiment.
Module Contents#
Classes#
| Trains a sentiment analyser inspired by the algorithm by Vivek Narayanan. | |
| Sentiment analyser inspired by the algorithm by Vivek Narayanan. | 
- class ViveknSentimentApproach[source]#
- Trains a sentiment analyser inspired by the algorithm by Vivek Narayanan. - The analyzer requires sentence boundaries to give a score in context. Tokenization is needed to make sure tokens are within bounds. Transitivity requirements are also required. - The training data needs to consist of a column for normalized text and a label column (either - "positive"or- "negative").- For extended examples of usage, see the Examples. - Input Annotation types - Output Annotation type - TOKEN, DOCUMENT- SENTIMENT- Parameters:
- sentimentCol
- column with the sentiment result of every row. Must be ‘positive’ or ‘negative’ 
- pruneCorpus
- Removes unfrequent scenarios from scope. The higher the better performance. Defaults 1 
 
 - References - The algorithm is based on the paper “Fast and accurate sentiment classification using an enhanced Naive Bayes model”. - Examples - >>> import sparknlp >>> from sparknlp.base import * >>> from sparknlp.annotator import * >>> from pyspark.ml import Pipeline >>> document = DocumentAssembler() \ ... .setInputCol("text") \ ... .setOutputCol("document") >>> token = Tokenizer() \ ... .setInputCols(["document"]) \ ... .setOutputCol("token") >>> normalizer = Normalizer() \ ... .setInputCols(["token"]) \ ... .setOutputCol("normal") >>> vivekn = ViveknSentimentApproach() \ ... .setInputCols(["document", "normal"]) \ ... .setSentimentCol("train_sentiment") \ ... .setOutputCol("result_sentiment") >>> finisher = Finisher() \ ... .setInputCols(["result_sentiment"]) \ ... .setOutputCols("final_sentiment") >>> pipeline = Pipeline().setStages([document, token, normalizer, vivekn, finisher]) >>> training = spark.createDataFrame([ ... ("I really liked this movie!", "positive"), ... ("The cast was horrible", "negative"), ... ("Never going to watch this again or recommend it to anyone", "negative"), ... ("It's a waste of time", "negative"), ... ("I loved the protagonist", "positive"), ... ("The music was really really good", "positive") ... ]).toDF("text", "train_sentiment") >>> pipelineModel = pipeline.fit(training) >>> data = spark.createDataFrame([ ... ["I recommend this movie"], ... ["Dont waste your time!!!"] ... ]).toDF("text") >>> result = pipelineModel.transform(data) >>> result.select("final_sentiment").show(truncate=False) +---------------+ |final_sentiment| +---------------+ |[positive] | |[negative] | +---------------+ 
- class ViveknSentimentModel(classname='com.johnsnowlabs.nlp.annotators.sda.vivekn.ViveknSentimentModel', java_model=None)[source]#
- Sentiment analyser inspired by the algorithm by Vivek Narayanan. - This is the instantiated model of the - ViveknSentimentApproach. For training your own model, please see the documentation of that class.- The analyzer requires sentence boundaries to give a score in context. Tokenization is needed to make sure tokens are within bounds. Transitivity requirements are also required. - For extended examples of usage, see the Examples. - Input Annotation types - Output Annotation type - TOKEN, DOCUMENT- SENTIMENT- Parameters:
- None
 
 - References - The algorithm is based on the paper “Fast and accurate sentiment classification using an enhanced Naive Bayes model”. - static pretrained(name='sentiment_vivekn', lang='en', remote_loc=None)[source]#
- Downloads and loads a pretrained model. - Parameters:
- namestr, optional
- Name of the pretrained model, by default “sentiment_vivekn” 
- langstr, optional
- Language of the pretrained model, by default “en” 
- remote_locstr, optional
- Optional remote address of the resource, by default None. Will use Spark NLPs repositories otherwise. 
 
- Returns:
- ViveknSentimentModel
- The restored model