BERT Biomedical Embeddings

Description

This embeddings model was imported from Hugging Face (link). It is pre trained from scratch using abstracts from PubMed and full-text articles from PubMedCentral. This model achieves state-of-the-art performance on many biomedical NLP tasks, and currently holds the top score on the Biomedical Language Understanding and Reasoning Benchmark.

Predicted Entities

Download Copy S3 URI

How to use

embeddings = BertEmbeddings.pretrained("bert_biomed_pubmed_uncased", "en")\
      .setInputCols(["sentence", "token"])\
      .setOutputCol("embeddings")
val embeddings = BertEmbeddings.pretrained("bert_biomed_pubmed_uncased", "en")
      .setInputCols(Array("sentence", "token"))
      .setOutputCol("embeddings")
import nlu
nlu.load("en.embed.bert.pubmed.uncased").predict("""Put your text here.""")

Model Information

Model Name: bert_biomed_pubmed_uncased
Compatibility: Spark NLP 3.4.0+
License: Open Source
Edition: Official
Input Labels: [sentence, token]
Output Labels: [embeddings]
Language: en
Size: 411.0 MB
Case sensitive: true

References

https://pubmed.ncbi.nlm.nih.gov/

@misc{pubmedbert,
  author = {Yu Gu and Robert Tinn and Hao Cheng and Michael Lucas and Naoto Usuyama and Xiaodong Liu and Tristan Naumann and Jianfeng Gao and Hoifung Poon},
  title = {Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing},
  year = {2020},
  eprint = {arXiv:2007.15779},
}