Sentiment Analysis of French texts

Description

This model identifies the sentiments (positive or negative) in French texts.

Predicted Entities

Live Demo Open in Colab Download Copy S3 URI

How to use

document = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")

embeddings = BertSentenceEmbeddings\
.pretrained('labse', 'xx') \
.setInputCols(["document"])\
.setOutputCol("sentence_embeddings")

sentimentClassifier = ClassifierDLModel.pretrained("classifierdl_bert_sentiment", "fr") \
.setInputCols(["document", "sentence_embeddings"]) \
.setOutputCol("class")

fr_sentiment_pipeline = Pipeline(stages=[document, embeddings, sentimentClassifier])

light_pipeline = LightPipeline(fr_sentiment_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))
result1 = light_pipeline.annotate("Mignolet vraiment dommage de ne jamais le voir comme titulaire")
result2 = light_pipeline.annotate("Je me sens bien, je suis heureux d'être de retour.")
print(result1["class"], result2["class"], sep = "\n")

val document = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val embeddings = BertSentenceEmbeddings
.pretrained("labse", "xx") 
.setInputCols(Array("document"))
.setOutputCol("sentence_embeddings")

val sentimentClassifier = ClassifierDLModel.pretrained("classifierdl_bert_sentiment", "fr") 
.setInputCols(Array("document", "sentence_embeddings")) 
.setOutputCol("class")

val fr_sentiment_pipeline = new Pipeline().setStages(Array(document, embeddings, sentimentClassifier))

val light_pipeline = LightPipeline(fr_sentiment_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))
val result1 = light_pipeline.annotate("Mignolet vraiment dommage de ne jamais le voir comme titulaire")
val result2 = light_pipeline.annotate("Je me sens bien, je suis heureux d'être de retour.")

import nlu
nlu.load("fr.classify.sentiment.bert").predict("""Mignolet vraiment dommage de ne jamais le voir comme titulaire""")

Results

['NEGATIVE']
['POSITIVE']

Model Information

Model Name:	classifierdl_bert_sentiment
Compatibility:	Spark NLP 3.2.0+
License:	Open Source
Edition:	Official
Input Labels:	[sentence_embeddings]
Output Labels:	[class]
Language:	fr

Data Source

https://github.com/charlesmalafosse/open-dataset-for-sentiment-analysis/

Benchmarking

precision    recall  f1-score   support

NEGATIVE       0.82      0.72      0.77       378
POSITIVE       0.92      0.95      0.94      1240

accuracy                           0.90      1618
macro avg       0.87      0.84      0.85      1618
weighted avg       0.90      0.90      0.90      1618

PREVIOUSLegal BERT Base Uncased Embedding

NEXTWord Embeddings for Japanese (japanese_cc_300d)