Description
BERT (Bidirectional Encoder Representations from Transformers) provides dense vector representations for natural language by using a deep, pre-trained neural network with the Transformer architecture. It was originally published by
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova: “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, 2018.
The weights of this model are those released by the original BERT authors. This model has been pre-trained for Chinese on Wikipedia. For training, random input masking has been applied independently to word pieces (as in the original BERT paper).
How to use
embeddings = BertEmbeddings.pretrained("bert_base_chinese", "zh") \
.setInputCols("sentence", "token") \
.setOutputCol("embeddings")
nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings])
val embeddings = BertEmbeddings.pretrained("bert_base_chinese", "zh")
.setInputCols("sentence", "token")
.setOutputCol("embeddings")
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings))
import nlu
nlu.load("zh.embed").predict("""Put your text here.""")
Model Information
Model Name: | bert_base_chinese |
Compatibility: | Spark NLP 3.1.0+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [token, sentence] |
Output Labels: | [embeddings] |
Language: | zh |
Case sensitive: | true |