CamemBERT Base Model

Description

CamemBERT is a state-of-the-art language model for French based on the RoBERTa model. For further information or requests, please go to Camembert Website

Predicted Entities

Download Copy S3 URI

How to use

embeddings = CamemBertEmbeddings.pretrained("camembert_base", "fr") \
.setInputCols("sentence", "token") \
.setOutputCol("embeddings")
val embeddings = CamemBertEmbeddings.pretrained("camembert_base", "fr")
.setInputCols("sentence", "token")
.setOutputCol("embeddings")
import nlu
nlu.load("fr.embed.camembert_base").predict("""Put your text here.""")

Model Information

Model Name: camembert_base
Compatibility: Spark NLP 5.5.0+
License: Open Source
Edition: Official
Input Labels: [document, token]
Output Labels: [camembert]
Language: fr
Size: 264.0 MB

Benchmarking



| Model                          | #params                        | Arch. | Training data                     |
|--------------------------------|--------------------------------|-------|-----------------------------------|
| `camembert-base` | 110M   | Base  | OSCAR (138 GB of text)            |
| `camembert/camembert-large`              | 335M    | Large | CCNet (135 GB of text)            |
| `camembert/camembert-base-ccnet`         | 110M    | Base  | CCNet (135 GB of text)            |
| `camembert/camembert-base-wikipedia-4gb` | 110M    | Base  | Wikipedia (4 GB of text)          |
| `camembert/camembert-base-oscar-4gb`     | 110M    | Base  | Subsample of OSCAR (4 GB of text) |
| `camembert/camembert-base-ccnet-4gb`     | 110M    | Base  | Subsample of CCNet (4 GB of text) |