sparknlp.annotator.sentiment.sentiment_detector
#
Contains classes for the SentimentDetector.
Module Contents#
Classes#
Trains a rule based sentiment detector, which calculates a score based on |
|
Rule based sentiment detector, which calculates a score based on |
- class SentimentDetector[source]#
Trains a rule based sentiment detector, which calculates a score based on predefined keywords.
A dictionary of predefined sentiment keywords must be provided with
setDictionary()
, where each line is a word delimited to its class (eitherpositive
ornegative
). The dictionary can be set in the form of a delimited text file.By default, the sentiment score will be assigned labels
"positive"
if the score is>= 0
, else"negative"
.For extended examples of usage, see the Examples.
Input Annotation types
Output Annotation type
TOKEN, DOCUMENT
SENTIMENT
- Parameters:
- dictionary
path for dictionary to sentiment analysis
See also
ViveknSentimentApproach
for an alternative approach to sentiment extraction
Examples
In this example, the dictionary
default-sentiment-dict.txt
has the form of:... cool,positive superb,positive bad,negative uninspired,negative ...
where each sentiment keyword is delimited by
","
.>>> import sparknlp >>> from sparknlp.base import * >>> from sparknlp.annotator import * >>> from pyspark.ml import Pipeline >>> documentAssembler = DocumentAssembler() \ ... .setInputCol("text") \ ... .setOutputCol("document") >>> tokenizer = Tokenizer() \ ... .setInputCols(["document"]) \ ... .setOutputCol("token") >>> lemmatizer = Lemmatizer() \ ... .setInputCols(["token"]) \ ... .setOutputCol("lemma") \ ... .setDictionary("lemmas_small.txt", "->", "\t") >>> sentimentDetector = SentimentDetector() \ ... .setInputCols(["lemma", "document"]) \ ... .setOutputCol("sentimentScore") \ ... .setDictionary("default-sentiment-dict.txt", ",", ReadAs.TEXT) >>> pipeline = Pipeline().setStages([ ... documentAssembler, ... tokenizer, ... lemmatizer, ... sentimentDetector, ... ]) >>> data = spark.createDataFrame([ ... ["The staff of the restaurant is nice"], ... ["I recommend others to avoid because it is too expensive"] ... ]).toDF("text") >>> result = pipeline.fit(data).transform(data) >>> result.selectExpr("sentimentScore.result").show(truncate=False) +----------+ |result | +----------+ |[positive]| |[negative]| +----------+
- setDictionary(path, delimiter, read_as=ReadAs.TEXT, options={'format': 'text'})[source]#
Sets path for dictionary to sentiment analysis
- Parameters:
- pathstr
Path to dictionary file
- delimiterstr
Delimiter for entries
- read_assttr, optional
How to read the resource, by default ReadAs.TEXT
- optionsdict, optional
Options for reading the resource, by default {‘format’: ‘text’}
- class SentimentDetectorModel(classname='com.johnsnowlabs.nlp.annotators.sda.pragmatic.SentimentDetectorModel', java_model=None)[source]#
Rule based sentiment detector, which calculates a score based on predefined keywords.
This is the instantiated model of the
SentimentDetector
. For training your own model, please see the documentation of that class.By default, the sentiment score will be assigned labels
"positive"
if the score is>= 0
, else"negative"
.For extended examples of usage, see the Examples.
Input Annotation types
Output Annotation type
TOKEN, DOCUMENT
SENTIMENT
- Parameters:
- None