com.johnsnowlabs.nlp.annotators.sda.pragmatic
SentimentDetector
Companion object SentimentDetector
class SentimentDetector extends AnnotatorApproach[SentimentDetectorModel]
Trains a rule based sentiment detector, which calculates a score based on predefined keywords.
A dictionary of predefined sentiment keywords must be provided with setDictionary
, where
each line is a word delimited to its class (either positive
or negative
). The dictionary
can be set in either in the form of a delimited text file or directly as an
ExternalResource.
By default, the sentiment score will be assigned labels "positive"
if the score is >= 0
,
else "negative"
. To retrieve the raw sentiment scores, enableScore
needs to be set to
true
.
For extended examples of usage, see the Examples and the SentimentTestSpec.
Example
In this example, the dictionary default-sentiment-dict.txt
has the form of
... cool,positive superb,positive bad,negative uninspired,negative ...
where each sentiment keyword is delimited by ","
.
import spark.implicits._ import com.johnsnowlabs.nlp.DocumentAssembler import com.johnsnowlabs.nlp.annotator.Tokenizer import com.johnsnowlabs.nlp.annotators.Lemmatizer import com.johnsnowlabs.nlp.annotators.sda.pragmatic.SentimentDetector import com.johnsnowlabs.nlp.util.io.ReadAs import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val lemmatizer = new Lemmatizer() .setInputCols("token") .setOutputCol("lemma") .setDictionary("src/test/resources/lemma-corpus-small/lemmas_small.txt", "->", "\t") val sentimentDetector = new SentimentDetector() .setInputCols("lemma", "document") .setOutputCol("sentimentScore") .setDictionary("src/test/resources/sentiment-corpus/default-sentiment-dict.txt", ",", ReadAs.TEXT) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, lemmatizer, sentimentDetector, )) val data = Seq( "The staff of the restaurant is nice", "I recommend others to avoid because it is too expensive" ).toDF("text") val result = pipeline.fit(data).transform(data) result.selectExpr("sentimentScore.result").show(false) +----------+ // +------+ for enableScore set to true |result | // |result| +----------+ // +------+ |[positive]| // |[1.0] | |[negative]| // |[-2.0]| +----------+ // +------+
- See also
ViveknSentimentApproach for an alternative approach to sentiment extraction
- Grouped
- Alphabetic
- By Inheritance
- SentimentDetector
- AnnotatorApproach
- CanBeLazy
- DefaultParamsWritable
- MLWritable
- HasOutputAnnotatorType
- HasOutputAnnotationCol
- HasInputAnnotationCols
- Estimator
- PipelineStage
- Logging
- Params
- Serializable
- Serializable
- Identifiable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Parameters
A list of (hyper-)parameter keys this annotator can take. Users can set and get the parameter values through setters and getters, respectively.
-
val
decrementMultiplier: DoubleParam
Multiplier for decrement sentiments (Default:
-2.0
) -
val
dictionary: ExternalResourceParam
Delimited file with a list sentiment tags per word (either
positive
ornegative
).Delimited file with a list sentiment tags per word (either
positive
ornegative
). Requires 'delimiter
' inoptions
.Example
cool,positive superb,positive bad,negative uninspired,negative
where the '
delimiter
' options was set withMap("delimiter" -> ",")
-
val
enableScore: BooleanParam
If true, score will show as the double value, else will output string
"positive"
or"negative"
(Default:false
) -
val
incrementMultiplier: DoubleParam
Multiplier for increment sentiments (Default:
2.0
) -
val
negativeMultiplier: DoubleParam
Multiplier for negative sentiments (Default:
-1.0
) -
val
positiveMultiplier: DoubleParam
Multiplier for positive sentiments (Default:
1.0
) -
val
reverseMultiplier: DoubleParam
Multiplier for revert sentiments (Default:
-1.0
) -
def
setDecrementMultiplier(v: Double): SentimentDetector.this.type
Multiplier for decrement sentiments (Default:
-2.0
) -
def
setDictionary(path: String, delimiter: String, readAs: Format, options: Map[String, String] = Map("format" -> "text")): SentimentDetector.this.type
Delimited file with a list sentiment tags per word.
Delimited file with a list sentiment tags per word. Requires 'delimiter' in options. Dictionary needs 'delimiter' in order to separate words from sentiment tags
-
def
setDictionary(value: ExternalResource): SentimentDetector.this.type
Delimited file with a list sentiment tags per word.
Delimited file with a list sentiment tags per word. Requires 'delimiter' in options. Dictionary needs 'delimiter' in order to separate words from sentiment tags
-
def
setEnableScore(v: Boolean): SentimentDetector.this.type
If true, score will show as the double value, else will output string
"positive"
or"negative"
(Default:false
) -
def
setIncrementMultiplier(v: Double): SentimentDetector.this.type
Multiplier for increment sentiments (Default:
2.0
) -
def
setNegativeMultiplier(v: Double): SentimentDetector.this.type
Multiplier for negative sentiments (Default:
-1.0
) -
def
setPositiveMultiplier(v: Double): SentimentDetector.this.type
Multiplier for positive sentiments (Default:
1.0
) -
def
setReverseMultiplier(v: Double): SentimentDetector.this.type
Multiplier for revert sentiments (Default:
-1.0
)
Annotator types
Required input and expected output annotator types
-
val
inputAnnotatorTypes: Array[AnnotatorType]
Input annotation type : TOKEN, DOCUMENT
Input annotation type : TOKEN, DOCUMENT
- Definition Classes
- SentimentDetector → HasInputAnnotationCols
-
val
outputAnnotatorType: AnnotatorType
Output annotation type : SENTIMENT
Output annotation type : SENTIMENT
- Definition Classes
- SentimentDetector → HasOutputAnnotatorType
Members
-
type
AnnotatorType = String
- Definition Classes
- HasOutputAnnotatorType
-
def
beforeTraining(spark: SparkSession): Unit
- Definition Classes
- AnnotatorApproach
-
final
def
clear(param: Param[_]): SentimentDetector.this.type
- Definition Classes
- Params
-
final
def
copy(extra: ParamMap): Estimator[SentimentDetectorModel]
- Definition Classes
- AnnotatorApproach → Estimator → PipelineStage → Params
-
val
description: String
Rule based sentiment detector
Rule based sentiment detector
- Definition Classes
- SentimentDetector → AnnotatorApproach
-
def
explainParam(param: Param[_]): String
- Definition Classes
- Params
-
def
explainParams(): String
- Definition Classes
- Params
-
final
def
extractParamMap(): ParamMap
- Definition Classes
- Params
-
final
def
extractParamMap(extra: ParamMap): ParamMap
- Definition Classes
- Params
-
final
def
fit(dataset: Dataset[_]): SentimentDetectorModel
- Definition Classes
- AnnotatorApproach → Estimator
-
def
fit(dataset: Dataset[_], paramMaps: Seq[ParamMap]): Seq[SentimentDetectorModel]
- Definition Classes
- Estimator
- Annotations
- @Since( "2.0.0" )
-
def
fit(dataset: Dataset[_], paramMap: ParamMap): SentimentDetectorModel
- Definition Classes
- Estimator
- Annotations
- @Since( "2.0.0" )
-
def
fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): SentimentDetectorModel
- Definition Classes
- Estimator
- Annotations
- @Since( "2.0.0" ) @varargs()
-
final
def
get[T](param: Param[T]): Option[T]
- Definition Classes
- Params
-
final
def
getDefault[T](param: Param[T]): Option[T]
- Definition Classes
- Params
-
def
getInputCols: Array[String]
- returns
input annotations columns currently used
- Definition Classes
- HasInputAnnotationCols
-
def
getLazyAnnotator: Boolean
- Definition Classes
- CanBeLazy
-
final
def
getOrDefault[T](param: Param[T]): T
- Definition Classes
- Params
-
final
def
getOutputCol: String
Gets annotation column name going to generate
Gets annotation column name going to generate
- Definition Classes
- HasOutputAnnotationCol
-
def
getParam(paramName: String): Param[Any]
- Definition Classes
- Params
-
final
def
hasDefault[T](param: Param[T]): Boolean
- Definition Classes
- Params
-
def
hasParam(paramName: String): Boolean
- Definition Classes
- Params
-
final
def
isDefined(param: Param[_]): Boolean
- Definition Classes
- Params
-
final
def
isSet(param: Param[_]): Boolean
- Definition Classes
- Params
-
val
lazyAnnotator: BooleanParam
- Definition Classes
- CanBeLazy
-
def
onTrained(model: SentimentDetectorModel, spark: SparkSession): Unit
- Definition Classes
- AnnotatorApproach
-
val
optionalInputAnnotatorTypes: Array[String]
- Definition Classes
- HasInputAnnotationCols
-
lazy val
params: Array[Param[_]]
- Definition Classes
- Params
-
def
save(path: String): Unit
- Definition Classes
- MLWritable
- Annotations
- @Since( "1.6.0" ) @throws( ... )
-
final
def
set[T](param: Param[T], value: T): SentimentDetector.this.type
- Definition Classes
- Params
-
final
def
setInputCols(value: String*): SentimentDetector.this.type
- Definition Classes
- HasInputAnnotationCols
-
def
setInputCols(value: Array[String]): SentimentDetector.this.type
Overrides required annotators column if different than default
Overrides required annotators column if different than default
- Definition Classes
- HasInputAnnotationCols
-
def
setLazyAnnotator(value: Boolean): SentimentDetector.this.type
- Definition Classes
- CanBeLazy
-
final
def
setOutputCol(value: String): SentimentDetector.this.type
Overrides annotation column name when transforming
Overrides annotation column name when transforming
- Definition Classes
- HasOutputAnnotationCol
-
def
toString(): String
- Definition Classes
- Identifiable → AnyRef → Any
-
def
train(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): SentimentDetectorModel
- Definition Classes
- SentimentDetector → AnnotatorApproach
-
final
def
transformSchema(schema: StructType): StructType
requirement for pipeline transformation validation.
requirement for pipeline transformation validation. It is called on fit()
- Definition Classes
- AnnotatorApproach → PipelineStage
-
val
uid: String
- Definition Classes
- SentimentDetector → Identifiable
-
def
write: MLWriter
- Definition Classes
- DefaultParamsWritable → MLWritable