sparknlp.annotator.sentiment.sentiment_detector#
Contains classes for the SentimentDetector.
Module Contents#
Classes#
Trains a rule based sentiment detector, which calculates a score based on |
|
Rule based sentiment detector, which calculates a score based on |
- class SentimentDetector[source]#
Trains a rule based sentiment detector, which calculates a score based on predefined keywords.
A dictionary of predefined sentiment keywords must be provided with
setDictionary(), where each line is a word delimited to its class (eitherpositiveornegative). The dictionary can be set in the form of a delimited text file.By default, the sentiment score will be assigned labels
"positive"if the score is>= 0, else"negative".For extended examples of usage, see the Examples.
Input Annotation types
Output Annotation type
TOKEN, DOCUMENTSENTIMENT- Parameters:
- dictionary
path for dictionary to sentiment analysis
See also
ViveknSentimentApproachfor an alternative approach to sentiment extraction
Examples
In this example, the dictionary
default-sentiment-dict.txthas the form of:... cool,positive superb,positive bad,negative uninspired,negative ...
where each sentiment keyword is delimited by
",".>>> import sparknlp >>> from sparknlp.base import * >>> from sparknlp.annotator import * >>> from pyspark.ml import Pipeline >>> documentAssembler = DocumentAssembler() \ ... .setInputCol("text") \ ... .setOutputCol("document") >>> tokenizer = Tokenizer() \ ... .setInputCols(["document"]) \ ... .setOutputCol("token") >>> lemmatizer = Lemmatizer() \ ... .setInputCols(["token"]) \ ... .setOutputCol("lemma") \ ... .setDictionary("lemmas_small.txt", "->", "\t") >>> sentimentDetector = SentimentDetector() \ ... .setInputCols(["lemma", "document"]) \ ... .setOutputCol("sentimentScore") \ ... .setDictionary("default-sentiment-dict.txt", ",", ReadAs.TEXT) >>> pipeline = Pipeline().setStages([ ... documentAssembler, ... tokenizer, ... lemmatizer, ... sentimentDetector, ... ]) >>> data = spark.createDataFrame([ ... ["The staff of the restaurant is nice"], ... ["I recommend others to avoid because it is too expensive"] ... ]).toDF("text") >>> result = pipeline.fit(data).transform(data) >>> result.selectExpr("sentimentScore.result").show(truncate=False) +----------+ |result | +----------+ |[positive]| |[negative]| +----------+
- setDictionary(path, delimiter, read_as=ReadAs.TEXT, options={'format': 'text'})[source]#
Sets path for dictionary to sentiment analysis
- Parameters:
- pathstr
Path to dictionary file
- delimiterstr
Delimiter for entries
- read_assttr, optional
How to read the resource, by default ReadAs.TEXT
- optionsdict, optional
Options for reading the resource, by default {‘format’: ‘text’}
- class SentimentDetectorModel(classname='com.johnsnowlabs.nlp.annotators.sda.pragmatic.SentimentDetectorModel', java_model=None)[source]#
Rule based sentiment detector, which calculates a score based on predefined keywords.
This is the instantiated model of the
SentimentDetector. For training your own model, please see the documentation of that class.By default, the sentiment score will be assigned labels
"positive"if the score is>= 0, else"negative".For extended examples of usage, see the Examples.
Input Annotation types
Output Annotation type
TOKEN, DOCUMENTSENTIMENT- Parameters:
- None