Packages

c

com.johnsnowlabs.nlp.annotators.sda.vivekn

ViveknSentimentApproach

class ViveknSentimentApproach extends AnnotatorApproach[ViveknSentimentModel] with ViveknSentimentUtils

Trains a sentiment analyser inspired by the algorithm by Vivek Narayanan https://github.com/vivekn/sentiment/.

The algorithm is based on the paper "Fast and accurate sentiment classification using an enhanced Naive Bayes model".

The analyzer requires sentence boundaries to give a score in context. Tokenization is needed to make sure tokens are within bounds. Transitivity requirements are also required.

The training data needs to consist of a column for normalized text and a label column (either "positive" or "negative").

For extended examples of usage, see the Examples and the ViveknSentimentTestSpec.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotators.Tokenizer
import com.johnsnowlabs.nlp.annotators.Normalizer
import com.johnsnowlabs.nlp.annotators.sda.vivekn.ViveknSentimentApproach
import com.johnsnowlabs.nlp.Finisher
import org.apache.spark.ml.Pipeline

val document = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val token = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val normalizer = new Normalizer()
  .setInputCols("token")
  .setOutputCol("normal")

val vivekn = new ViveknSentimentApproach()
  .setInputCols("document", "normal")
  .setSentimentCol("train_sentiment")
  .setOutputCol("result_sentiment")

val finisher = new Finisher()
  .setInputCols("result_sentiment")
  .setOutputCols("final_sentiment")

val pipeline = new Pipeline().setStages(Array(document, token, normalizer, vivekn, finisher))

val training = Seq(
  ("I really liked this movie!", "positive"),
  ("The cast was horrible", "negative"),
  ("Never going to watch this again or recommend it to anyone", "negative"),
  ("It's a waste of time", "negative"),
  ("I loved the protagonist", "positive"),
  ("The music was really really good", "positive")
).toDF("text", "train_sentiment")
val pipelineModel = pipeline.fit(training)

val data = Seq(
  "I recommend this movie",
  "Dont waste your time!!!"
).toDF("text")
val result = pipelineModel.transform(data)

result.select("final_sentiment").show(false)
+---------------+
|final_sentiment|
+---------------+
|[positive]     |
|[negative]     |
+---------------+
See also

SentimentDetector for an alternative approach to sentiment detection

Linear Supertypes
ViveknSentimentUtils, AnnotatorApproach[ViveknSentimentModel], CanBeLazy, DefaultParamsWritable, MLWritable, HasOutputAnnotatorType, HasOutputAnnotationCol, HasInputAnnotationCols, Estimator[ViveknSentimentModel], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. ViveknSentimentApproach
  2. ViveknSentimentUtils
  3. AnnotatorApproach
  4. CanBeLazy
  5. DefaultParamsWritable
  6. MLWritable
  7. HasOutputAnnotatorType
  8. HasOutputAnnotationCol
  9. HasInputAnnotationCols
  10. Estimator
  11. PipelineStage
  12. Logging
  13. Params
  14. Serializable
  15. Serializable
  16. Identifiable
  17. AnyRef
  18. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Parameters

A list of (hyper-)parameter keys this annotator can take. Users can set and get the parameter values through setters and getters, respectively.

  1. val featureLimit: IntParam

    content feature limit, to boost performance in very dirt text (Default: Disabled with -1)

  2. val importantFeatureRatio: DoubleParam

    Proportion of feature content to be considered relevant (Default: 0.5)

  3. val pruneCorpus: IntParam

    Removes unfrequent scenarios from scope.

    Removes unfrequent scenarios from scope. The higher the better performance (Default: 1)

  4. val sentimentCol: Param[String]

    Column with the sentiment result of every row.

    Column with the sentiment result of every row. Must be "positive" or "negative"

  5. val unimportantFeatureStep: DoubleParam

    Proportion to lookahead in unimportant features (Default: 0.025)

Annotator types

Required input and expected output annotator types

  1. val inputAnnotatorTypes: Array[AnnotatorType]

    Input annotator type : TOKEN, DOCUMENT

    Input annotator type : TOKEN, DOCUMENT

    Definition Classes
    ViveknSentimentApproachHasInputAnnotationCols
  2. val outputAnnotatorType: AnnotatorType

    Output annotator type : SENTIMENT

    Output annotator type : SENTIMENT

    Definition Classes
    ViveknSentimentApproachHasOutputAnnotatorType

Members

  1. type AnnotatorType = String
    Definition Classes
    HasOutputAnnotatorType
  1. def ViveknWordCount(er: ExternalResource, prune: Int, f: (List[String]) ⇒ Set[String], left: Map[String, Long] = ..., right: Map[String, Long] = ...): (Map[String, Long], Map[String, Long])
    Definition Classes
    ViveknSentimentUtils
  2. def beforeTraining(spark: SparkSession): Unit
    Definition Classes
    AnnotatorApproach
  3. final def clear(param: Param[_]): ViveknSentimentApproach.this.type
    Definition Classes
    Params
  4. final def copy(extra: ParamMap): Estimator[ViveknSentimentModel]
    Definition Classes
    AnnotatorApproach → Estimator → PipelineStage → Params
  5. val description: String

    Vivekn inspired sentiment analysis model

    Vivekn inspired sentiment analysis model

    Definition Classes
    ViveknSentimentApproachAnnotatorApproach
  6. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  7. def explainParams(): String
    Definition Classes
    Params
  8. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  9. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  10. final def fit(dataset: Dataset[_]): ViveknSentimentModel
    Definition Classes
    AnnotatorApproach → Estimator
  11. def fit(dataset: Dataset[_], paramMaps: Seq[ParamMap]): Seq[ViveknSentimentModel]
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  12. def fit(dataset: Dataset[_], paramMap: ParamMap): ViveknSentimentModel
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  13. def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): ViveknSentimentModel
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" ) @varargs()
  14. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  15. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  16. def getInputCols: Array[String]

    returns

    input annotations columns currently used

    Definition Classes
    HasInputAnnotationCols
  17. def getLazyAnnotator: Boolean
    Definition Classes
    CanBeLazy
  18. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  19. final def getOutputCol: String

    Gets annotation column name going to generate

    Gets annotation column name going to generate

    Definition Classes
    HasOutputAnnotationCol
  20. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  21. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  22. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  23. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  24. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  25. val lazyAnnotator: BooleanParam
    Definition Classes
    CanBeLazy
  26. def negateSequence(words: Array[String]): Set[String]

    Detects negations and transforms them into not_ form

    Detects negations and transforms them into not_ form

    Definition Classes
    ViveknSentimentUtils
  27. def onTrained(model: ViveknSentimentModel, spark: SparkSession): Unit
    Definition Classes
    AnnotatorApproach
  28. val optionalInputAnnotatorTypes: Array[String]
    Definition Classes
    HasInputAnnotationCols
  29. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  30. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  31. final def set[T](param: Param[T], value: T): ViveknSentimentApproach.this.type
    Definition Classes
    Params
  32. final def setInputCols(value: String*): ViveknSentimentApproach.this.type
    Definition Classes
    HasInputAnnotationCols
  33. def setInputCols(value: Array[String]): ViveknSentimentApproach.this.type

    Overrides required annotators column if different than default

    Overrides required annotators column if different than default

    Definition Classes
    HasInputAnnotationCols
  34. def setLazyAnnotator(value: Boolean): ViveknSentimentApproach.this.type
    Definition Classes
    CanBeLazy
  35. final def setOutputCol(value: String): ViveknSentimentApproach.this.type

    Overrides annotation column name when transforming

    Overrides annotation column name when transforming

    Definition Classes
    HasOutputAnnotationCol
  36. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  37. def train(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): ViveknSentimentModel
  38. final def transformSchema(schema: StructType): StructType

    requirement for pipeline transformation validation.

    requirement for pipeline transformation validation. It is called on fit()

    Definition Classes
    AnnotatorApproach → PipelineStage
  39. val uid: String
    Definition Classes
    ViveknSentimentApproach → Identifiable
  40. def write: MLWriter
    Definition Classes
    DefaultParamsWritable → MLWritable

Parameter setters

  1. def setFeatureLimit(v: Int): ViveknSentimentApproach.this.type

    Set content feature limit, to boost performance in very dirt text (Default: Disabled with -1)

  2. def setImportantFeatureRatio(v: Double): ViveknSentimentApproach.this.type

    Set Proportion of feature content to be considered relevant (Default: 0.5)

  3. def setPruneCorpus(value: Int): ViveknSentimentApproach.this.type

    when training on small data you may want to disable this to not cut off infrequent words

  4. def setSentimentCol(value: String): ViveknSentimentApproach.this.type

    Column with sentiment analysis row’s result for training.

    Column with sentiment analysis row’s result for training. If not set, external sources need to be set instead. Column with the sentiment result of every row. Must be 'positive' or 'negative'

  5. def setUnimportantFeatureStep(v: Double): ViveknSentimentApproach.this.type

    Set Proportion to lookahead in unimportant features (Default: 0.025)

Parameter getters

  1. def getFeatureLimit(v: Int): Int

    Get content feature limit, to boost performance in very dirt text (Default: Disabled with -1)

  2. def getImportantFeatureRatio(v: Double): Double

    Get Proportion of feature content to be considered relevant (Default: Disabled with 0.5)

  3. def getUnimportantFeatureStep(v: Double): Double

    Get Proportion to lookahead in unimportant features (Default: 0.025)