Description
This model uses a BERT base architecture1 pretrained from scratch on MEDLINE/PubMed
This is a BERT base architecture but some changes have been made to the original training and export scheme based on more recent learnings that improve its accuracy over the original BERT base checkpoint
How to use
embeddings = BertEmbeddings.pretrained("bert_pubmed", "en") \
.setInputCols("sentence", "token") \
.setOutputCol("embeddings")
nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings])
val embeddings = BertEmbeddings.pretrained("bert_pubmed", "en")
.setInputCols("sentence", "token")
.setOutputCol("embeddings")
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings))
import nlu
text = ["I love NLP"]
embeddings_df = nlu.load('en.embed.bert.pubmed').predict(text, output_level='token')
embeddings_df
Model Information
Model Name: | bert_pubmed |
Compatibility: | Spark NLP 3.2.0+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [sentence, token] |
Output Labels: | [bert] |
Language: | en |
Case sensitive: | false |
Data Source
This Model has been imported from: https://tfhub.dev/google/experts/bert/pubmed/2