Sentiment Analysis in Spanish

Description

Model trained with TASS 2020 corpus (around ~5k tweets) of several dialects of Spanish. Base model is BETO, a BERT model trained in Spanish.

Predicted Entities

POS, NEG, NEU

Download Copy S3 URI

How to use

document_assembler = DocumentAssembler() \
    .setInputCol('text') \
    .setOutputCol('document')

tokenizer = Tokenizer() \
    .setInputCols(['document']) \
    .setOutputCol('token')

sequenceClassifier = BertForSequenceClassification.pretrained("beto_sentiment", "en")\
  .setInputCols(["document",'token'])\
  .setOutputCol("class")

pipeline = Pipeline(stages=[
    document_assembler, 
    tokenizer,
    sequenceClassifier   
])

# couple of simple examples
example = spark.createDataFrame([["Te quiero. Te amo."]]).toDF("text")

result = pipeline.fit(example).transform(example)

# result is a DataFrame
result.select("text", "class.result").show()
import nlu
nlu.load("es.classify.beto_bert.sentiment").predict("""Te quiero. Te amo.""")

Results

+------------------+------+
|              text|result|
+------------------+------+
|Te quiero. Te amo.| [POS]|
+------------------+------+

Model Information

Model Name: beto_sentiment
Compatibility: Spark NLP 4.2.0+
License: Open Source
Edition: Official
Input Labels: [document, token]
Output Labels: [class]
Language: es
Size: 412.4 MB
Case sensitive: true
Max sentence length: 128

References

https://github.com/finiteautomata/pysentimiento/