Spanish CBOW Legal Fast Text Embeddings (Cased, D300)

Description

Word Embeddings lookup annotator that maps tokens to vectors. In CBOW the distributed representations of context (or surrounding words) are combined to predict the word in the middle.

Download Copy S3 URI

How to use

 
model = WordEmbeddingsModel.pretrained("word2vec_cbow_legal_d300_cased","es")\
	            .setInputCols(["document","token"])\
	            .setOutputCol("word_embeddings")

val model = WordEmbeddingsModel.pretrained("word2vec_cbow_legal_d300_cased","es")
	                .setInputCols("document","token")
	                .setOutputCol("word_embeddings")

import nlu
nlu.load("es.embed.legal.cbow.cased_d300").predict("""Put your text here.""")

Model Information

Model Name:	word2vec_cbow_legal_d300_cased
Type:	embeddings
Compatibility:	Spark NLP 4.2.1+
License:	Open Source
Edition:	Official
Input Labels:	[document, token]
Output Labels:	[embeddings]
Language:	es
Size:	1.1 GB
Case sensitive:	false
Dimension:	300

References

https://zenodo.org/record/5036147#.Y3Op0XZBxD-

PREVIOUSSpanish CBOW Legal Fast Text Embeddings (Uncased, D100)

NEXTSpanish CBOW Legal Fast Text Embeddings (Uncased, D300)