Description
Word Embeddings lookup annotator that maps tokens to vectors. In CBOW the distributed representations of context (or surrounding words) are combined to predict the word in the middle.
How to use
model = WordEmbeddingsModel.pretrained("word2vec_cbow_legal_d300_uncased","es")\
.setInputCols(["document","token"])\
.setOutputCol("word_embeddings")
val model = WordEmbeddingsModel.pretrained("word2vec_cbow_legal_d300_uncased","es")
.setInputCols("document","token")
.setOutputCol("word_embeddings")
import nlu
nlu.load("es.embed.legal.cbow.uncased_d300").predict("""Put your text here.""")
Model Information
Model Name: | word2vec_cbow_legal_d300_uncased |
Type: | embeddings |
Compatibility: | Spark NLP 4.2.1+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [document, token] |
Output Labels: | [embeddings] |
Language: | es |
Size: | 996.8 MB |
Case sensitive: | false |
Dimension: | 300 |
References
https://zenodo.org/record/5036147#.Y3Op0XZBxD-