Description
BETO is a BERT model trained on a big Spanish corpus. BETO is of size similar to a BERT-Base and was trained with the Whole Word Masking technique. Below you find Tensorflow and Pytorch checkpoints for the uncased and cased versions, as well as some results for Spanish benchmarks comparing BETO with Multilingual BERT as well as other (not BERT-based) models.
Predicted Entities
How to use
sent_embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased", "es") \
.setInputCols("sentence") \
.setOutputCol("bert_sentence")
nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, sent_embeddings ])
val sent_embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased", "es")
.setInputCols("sentence")
.setOutputCol("bert_sentence")
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, sent_embeddings ))
import nlu
nlu.load("es.embed_sentence.bert.base_cased").predict("""Put your text here.""")
Model Information
Model Name: | sent_bert_base_cased |
Compatibility: | Spark NLP 3.2.2+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [sentence] |
Output Labels: | [bert_sentence] |
Language: | es |
Case sensitive: | true |
Data Source
The model is imported from: https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased