Multilingual BioLORD-2023-M XlmRoBertaSentenceEmbeddings from FremyCompany

Description

Pretrained XlmRoBertaSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. sent_xlm_roberta_biolord_2023_m is a multilingual model originally trained by FremyCompany. It supports English, Spanish, French, German, Dutch, Danish and Swedish.

Predicted Entities

Download Copy S3 URI

How to use

documentAssembler = DocumentAssembler() \
      .setInputCol("text") \
      .setOutputCol("document")

embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_xlm_roberta_biolord_2023_m","xx") \
      .setInputCols(["document"]) \
      .setOutputCol("embeddings")       
        
pipeline = Pipeline().setStages([documentAssembler, embeddings])

data = spark.createDataFrame([["Disfruto trabajando con Spark-NLP."]]).toDF("text")
pipelineModel = pipeline.fit(data)
result = pipelineModel.transform(data)
val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val embeddings = XlmRoBertaSentenceEmbeddings
  .pretrained("sent_xlm_roberta_biolord_2023_m", "xx")
  .setInputCols(Array("document"))
  .setOutputCol("embeddings")

val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings))


val data = Seq("Disfruto trabajando con Spark-NLP.").toDF("text")

val pipelineModel = pipeline.fit(data)
val result = pipelineModel.transform(data)

Results

+----------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------+
|                              text|                                                              document|                                                   sentence_embeddings|
+----------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------+
|Disfruto trabajando con Spark-NLP.|[{document, 0, 33, Disfruto trabajando con Spark-NLP., {sentence ->...|[{sentence_embeddings, 0, 33, Disfruto trabajando con Spark-NLP., {...|
+----------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------+

Model Information

Model Name: sent_xlm_roberta_biolord_2023_m
Compatibility: Spark NLP 5.5.2+
License: Open Source
Edition: Official
Input Labels: [document]
Output Labels: [xlm_sentence_embeddings]
Language: xx
Size: 1.0 GB

References

https://huggingface.co/FremyCompany/BioLORD-2023-M