`sparknlp.annotator.sentiment.sentiment_detector`#

Contains classes for the SentimentDetector.

Module Contents#

Classes#

`SentimentDetector`	Trains a rule based sentiment detector, which calculates a score based on
`SentimentDetectorModel`	Rule based sentiment detector, which calculates a score based on

class SentimentDetector[source]#

Trains a rule based sentiment detector, which calculates a score based on predefined keywords.

A dictionary of predefined sentiment keywords must be provided with setDictionary(), where each line is a word delimited to its class (either positive or negative). The dictionary can be set in the form of a delimited text file.

By default, the sentiment score will be assigned labels "positive" if the score is >= 0, else "negative".

For extended examples of usage, see the Examples.

Input Annotation types	Output Annotation type
`TOKEN, DOCUMENT`	`SENTIMENT`

Parameters:

dictionary: path for dictionary to sentiment analysis

See also

ViveknSentimentApproach: for an alternative approach to sentiment extraction

Examples

In this example, the dictionary default-sentiment-dict.txt has the form of:

...
cool,positive
superb,positive
bad,negative
uninspired,negative
...

where each sentiment keyword is delimited by ",".

>>> import sparknlp
>>> from sparknlp.base import *
>>> from sparknlp.annotator import *
>>> from pyspark.ml import Pipeline
>>> documentAssembler = DocumentAssembler() \
...     .setInputCol("text") \
...     .setOutputCol("document")
>>> tokenizer = Tokenizer() \
...     .setInputCols(["document"]) \
...     .setOutputCol("token")
>>> lemmatizer = Lemmatizer() \
...     .setInputCols(["token"]) \
...     .setOutputCol("lemma") \
...     .setDictionary("lemmas_small.txt", "->", "\t")
>>> sentimentDetector = SentimentDetector() \
...     .setInputCols(["lemma", "document"]) \
...     .setOutputCol("sentimentScore") \
...     .setDictionary("default-sentiment-dict.txt", ",", ReadAs.TEXT)
>>> pipeline = Pipeline().setStages([
...     documentAssembler,
...     tokenizer,
...     lemmatizer,
...     sentimentDetector,
... ])
>>> data = spark.createDataFrame([
...     ["The staff of the restaurant is nice"],
...     ["I recommend others to avoid because it is too expensive"]
... ]).toDF("text")
>>> result = pipeline.fit(data).transform(data)
>>> result.selectExpr("sentimentScore.result").show(truncate=False)
+----------+
|result    |
+----------+
|[positive]|
|[negative]|
+----------+

inputAnnotatorTypes[source]#

outputAnnotatorType = 'sentiment'[source]#

dictionary[source]#

positiveMultiplier[source]#

negativeMultiplier[source]#

incrementMultiplier[source]#

decrementMultiplier[source]#

reverseMultiplier[source]#

enableScore[source]#

setDictionary(path, delimiter, read_as=ReadAs.TEXT, options={'format': 'text'})[source]#

Sets path for dictionary to sentiment analysis

Parameters:

pathstr: Path to dictionary file
delimiterstr: Delimiter for entries
read_assttr, optional: How to read the resource, by default ReadAs.TEXT
optionsdict, optional: Options for reading the resource, by default {‘format’: ‘text’}

class SentimentDetectorModel(classname='com.johnsnowlabs.nlp.annotators.sda.pragmatic.SentimentDetectorModel', java_model=None)[source]#

Rule based sentiment detector, which calculates a score based on predefined keywords.

This is the instantiated model of the SentimentDetector. For training your own model, please see the documentation of that class.

By default, the sentiment score will be assigned labels "positive" if the score is >= 0, else "negative".

For extended examples of usage, see the Examples.

Input Annotation types	Output Annotation type
`TOKEN, DOCUMENT`	`SENTIMENT`

Parameters:

None

name = 'SentimentDetectorModel'[source]#

inputAnnotatorTypes[source]#

outputAnnotatorType = 'sentiment'[source]#

positiveMultiplier[source]#

sparknlp.annotator.sentiment.sentiment_detector#

Module Contents#

Classes#

`sparknlp.annotator.sentiment.sentiment_detector`#