Packages

class SentimentDetector extends AnnotatorApproach[SentimentDetectorModel]

Trains a rule based sentiment detector, which calculates a score based on predefined keywords.

A dictionary of predefined sentiment keywords must be provided with setDictionary, where each line is a word delimited to its class (either positive or negative). The dictionary can be set in either in the form of a delimited text file or directly as an ExternalResource.

By default, the sentiment score will be assigned labels "positive" if the score is >= 0, else "negative". To retrieve the raw sentiment scores, enableScore needs to be set to true.

For extended examples of usage, see the Examples and the SentimentTestSpec.

Example

In this example, the dictionary default-sentiment-dict.txt has the form of

...
cool,positive
superb,positive
bad,negative
uninspired,negative
...

where each sentiment keyword is delimited by ",".

import spark.implicits._
import com.johnsnowlabs.nlp.DocumentAssembler
import com.johnsnowlabs.nlp.annotator.Tokenizer
import com.johnsnowlabs.nlp.annotators.Lemmatizer
import com.johnsnowlabs.nlp.annotators.sda.pragmatic.SentimentDetector
import com.johnsnowlabs.nlp.util.io.ReadAs
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val lemmatizer = new Lemmatizer()
  .setInputCols("token")
  .setOutputCol("lemma")
  .setDictionary("src/test/resources/lemma-corpus-small/lemmas_small.txt", "->", "\t")

val sentimentDetector = new SentimentDetector()
  .setInputCols("lemma", "document")
  .setOutputCol("sentimentScore")
  .setDictionary("src/test/resources/sentiment-corpus/default-sentiment-dict.txt", ",", ReadAs.TEXT)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  lemmatizer,
  sentimentDetector,
))

val data = Seq(
  "The staff of the restaurant is nice",
  "I recommend others to avoid because it is too expensive"
).toDF("text")
val result = pipeline.fit(data).transform(data)

result.selectExpr("sentimentScore.result").show(false)
+----------+  //  +------+ for enableScore set to true
|result    |  //  |result|
+----------+  //  +------+
|[positive]|  //  |[1.0] |
|[negative]|  //  |[-2.0]|
+----------+  //  +------+
See also

ViveknSentimentApproach for an alternative approach to sentiment extraction

Linear Supertypes
AnnotatorApproach[SentimentDetectorModel], CanBeLazy, DefaultParamsWritable, MLWritable, HasOutputAnnotatorType, HasOutputAnnotationCol, HasInputAnnotationCols, Estimator[SentimentDetectorModel], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. SentimentDetector
  2. AnnotatorApproach
  3. CanBeLazy
  4. DefaultParamsWritable
  5. MLWritable
  6. HasOutputAnnotatorType
  7. HasOutputAnnotationCol
  8. HasInputAnnotationCols
  9. Estimator
  10. PipelineStage
  11. Logging
  12. Params
  13. Serializable
  14. Serializable
  15. Identifiable
  16. AnyRef
  17. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Parameters

A list of (hyper-)parameter keys this annotator can take. Users can set and get the parameter values through setters and getters, respectively.

  1. val decrementMultiplier: DoubleParam

    Multiplier for decrement sentiments (Default: -2.0)

  2. val dictionary: ExternalResourceParam

    Delimited file with a list sentiment tags per word (either positive or negative).

    Delimited file with a list sentiment tags per word (either positive or negative). Requires 'delimiter' in options.

    Example

    cool,positive
    superb,positive
    bad,negative
    uninspired,negative

    where the 'delimiter' options was set with Map("delimiter" -> ",")

  3. val enableScore: BooleanParam

    If true, score will show as the double value, else will output string "positive" or "negative" (Default: false)

  4. val incrementMultiplier: DoubleParam

    Multiplier for increment sentiments (Default: 2.0)

  5. val negativeMultiplier: DoubleParam

    Multiplier for negative sentiments (Default: -1.0)

  6. val positiveMultiplier: DoubleParam

    Multiplier for positive sentiments (Default: 1.0)

  7. val reverseMultiplier: DoubleParam

    Multiplier for revert sentiments (Default: -1.0)

  8. def setDecrementMultiplier(v: Double): SentimentDetector.this.type

    Multiplier for decrement sentiments (Default: -2.0)

  9. def setDictionary(path: String, delimiter: String, readAs: Format, options: Map[String, String] = Map("format" -> "text")): SentimentDetector.this.type

    Delimited file with a list sentiment tags per word.

    Delimited file with a list sentiment tags per word. Requires 'delimiter' in options. Dictionary needs 'delimiter' in order to separate words from sentiment tags

  10. def setDictionary(value: ExternalResource): SentimentDetector.this.type

    Delimited file with a list sentiment tags per word.

    Delimited file with a list sentiment tags per word. Requires 'delimiter' in options. Dictionary needs 'delimiter' in order to separate words from sentiment tags

  11. def setEnableScore(v: Boolean): SentimentDetector.this.type

    If true, score will show as the double value, else will output string "positive" or "negative" (Default: false)

  12. def setIncrementMultiplier(v: Double): SentimentDetector.this.type

    Multiplier for increment sentiments (Default: 2.0)

  13. def setNegativeMultiplier(v: Double): SentimentDetector.this.type

    Multiplier for negative sentiments (Default: -1.0)

  14. def setPositiveMultiplier(v: Double): SentimentDetector.this.type

    Multiplier for positive sentiments (Default: 1.0)

  15. def setReverseMultiplier(v: Double): SentimentDetector.this.type

    Multiplier for revert sentiments (Default: -1.0)

Annotator types

Required input and expected output annotator types

  1. val inputAnnotatorTypes: Array[AnnotatorType]

    Input annotation type : TOKEN, DOCUMENT

    Input annotation type : TOKEN, DOCUMENT

    Definition Classes
    SentimentDetectorHasInputAnnotationCols
  2. val outputAnnotatorType: AnnotatorType

    Output annotation type : SENTIMENT

    Output annotation type : SENTIMENT

    Definition Classes
    SentimentDetectorHasOutputAnnotatorType

Members

  1. type AnnotatorType = String
    Definition Classes
    HasOutputAnnotatorType
  1. def beforeTraining(spark: SparkSession): Unit
    Definition Classes
    AnnotatorApproach
  2. final def clear(param: Param[_]): SentimentDetector.this.type
    Definition Classes
    Params
  3. final def copy(extra: ParamMap): Estimator[SentimentDetectorModel]
    Definition Classes
    AnnotatorApproach → Estimator → PipelineStage → Params
  4. val description: String

    Rule based sentiment detector

    Rule based sentiment detector

    Definition Classes
    SentimentDetectorAnnotatorApproach
  5. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  6. def explainParams(): String
    Definition Classes
    Params
  7. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  8. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  9. final def fit(dataset: Dataset[_]): SentimentDetectorModel
    Definition Classes
    AnnotatorApproach → Estimator
  10. def fit(dataset: Dataset[_], paramMaps: Seq[ParamMap]): Seq[SentimentDetectorModel]
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  11. def fit(dataset: Dataset[_], paramMap: ParamMap): SentimentDetectorModel
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  12. def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): SentimentDetectorModel
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" ) @varargs()
  13. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  14. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  15. def getInputCols: Array[String]

    returns

    input annotations columns currently used

    Definition Classes
    HasInputAnnotationCols
  16. def getLazyAnnotator: Boolean
    Definition Classes
    CanBeLazy
  17. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  18. final def getOutputCol: String

    Gets annotation column name going to generate

    Gets annotation column name going to generate

    Definition Classes
    HasOutputAnnotationCol
  19. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  20. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  21. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  22. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  23. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  24. val lazyAnnotator: BooleanParam
    Definition Classes
    CanBeLazy
  25. def onTrained(model: SentimentDetectorModel, spark: SparkSession): Unit
    Definition Classes
    AnnotatorApproach
  26. val optionalInputAnnotatorTypes: Array[String]
    Definition Classes
    HasInputAnnotationCols
  27. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  28. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  29. final def set[T](param: Param[T], value: T): SentimentDetector.this.type
    Definition Classes
    Params
  30. final def setInputCols(value: String*): SentimentDetector.this.type
    Definition Classes
    HasInputAnnotationCols
  31. def setInputCols(value: Array[String]): SentimentDetector.this.type

    Overrides required annotators column if different than default

    Overrides required annotators column if different than default

    Definition Classes
    HasInputAnnotationCols
  32. def setLazyAnnotator(value: Boolean): SentimentDetector.this.type
    Definition Classes
    CanBeLazy
  33. final def setOutputCol(value: String): SentimentDetector.this.type

    Overrides annotation column name when transforming

    Overrides annotation column name when transforming

    Definition Classes
    HasOutputAnnotationCol
  34. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  35. def train(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): SentimentDetectorModel
    Definition Classes
    SentimentDetectorAnnotatorApproach
  36. final def transformSchema(schema: StructType): StructType

    requirement for pipeline transformation validation.

    requirement for pipeline transformation validation. It is called on fit()

    Definition Classes
    AnnotatorApproach → PipelineStage
  37. val uid: String
    Definition Classes
    SentimentDetector → Identifiable
  38. def write: MLWriter
    Definition Classes
    DefaultParamsWritable → MLWritable