sparknlp.annotator.sentiment.sentiment_detector#

Contains classes for the SentimentDetector.

Module Contents#

Classes#

SentimentDetector

Trains a rule based sentiment detector, which calculates a score based on

SentimentDetectorModel

Rule based sentiment detector, which calculates a score based on

class SentimentDetector[source]#

Trains a rule based sentiment detector, which calculates a score based on predefined keywords.

A dictionary of predefined sentiment keywords must be provided with setDictionary(), where each line is a word delimited to its class (either positive or negative). The dictionary can be set in the form of a delimited text file.

By default, the sentiment score will be assigned labels "positive" if the score is >= 0, else "negative".

For extended examples of usage, see the Examples.

Input Annotation types

Output Annotation type

TOKEN, DOCUMENT

SENTIMENT

Parameters:
dictionary

path for dictionary to sentiment analysis

See also

ViveknSentimentApproach

for an alternative approach to sentiment extraction

Examples

In this example, the dictionary default-sentiment-dict.txt has the form of:

...
cool,positive
superb,positive
bad,negative
uninspired,negative
...

where each sentiment keyword is delimited by ",".

>>> import sparknlp
>>> from sparknlp.base import *
>>> from sparknlp.annotator import *
>>> from pyspark.ml import Pipeline
>>> documentAssembler = DocumentAssembler() \
...     .setInputCol("text") \
...     .setOutputCol("document")
>>> tokenizer = Tokenizer() \
...     .setInputCols(["document"]) \
...     .setOutputCol("token")
>>> lemmatizer = Lemmatizer() \
...     .setInputCols(["token"]) \
...     .setOutputCol("lemma") \
...     .setDictionary("lemmas_small.txt", "->", "\t")
>>> sentimentDetector = SentimentDetector() \
...     .setInputCols(["lemma", "document"]) \
...     .setOutputCol("sentimentScore") \
...     .setDictionary("default-sentiment-dict.txt", ",", ReadAs.TEXT)
>>> pipeline = Pipeline().setStages([
...     documentAssembler,
...     tokenizer,
...     lemmatizer,
...     sentimentDetector,
... ])
>>> data = spark.createDataFrame([
...     ["The staff of the restaurant is nice"],
...     ["I recommend others to avoid because it is too expensive"]
... ]).toDF("text")
>>> result = pipeline.fit(data).transform(data)
>>> result.selectExpr("sentimentScore.result").show(truncate=False)
+----------+
|result    |
+----------+
|[positive]|
|[negative]|
+----------+
setDictionary(path, delimiter, read_as=ReadAs.TEXT, options={'format': 'text'})[source]#

Sets path for dictionary to sentiment analysis

Parameters:
pathstr

Path to dictionary file

delimiterstr

Delimiter for entries

read_assttr, optional

How to read the resource, by default ReadAs.TEXT

optionsdict, optional

Options for reading the resource, by default {‘format’: ‘text’}

class SentimentDetectorModel(classname='com.johnsnowlabs.nlp.annotators.sda.pragmatic.SentimentDetectorModel', java_model=None)[source]#

Rule based sentiment detector, which calculates a score based on predefined keywords.

This is the instantiated model of the SentimentDetector. For training your own model, please see the documentation of that class.

By default, the sentiment score will be assigned labels "positive" if the score is >= 0, else "negative".

For extended examples of usage, see the Examples.

Input Annotation types

Output Annotation type

TOKEN, DOCUMENT

SENTIMENT

Parameters:
None