BAAI general embedding English (bge_small)

Description

FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search. And it also can be used in vector database for LLMs.

bge is short for BAAI general embedding.

Model Language Description query instruction for retrieval*
BAAI/bge-large-en English rank 1st in MTEB leaderboard Represent this sentence for searching relevant passages:
BAAI/bge-base-en English rank 2nd in MTEB leaderboard Represent this sentence for searching relevant passages:
BAAI/bge-small-en English a small-scale model but with competitive performance Represent this sentence for searching relevant passages:
BAAI/bge-large-zh Chinese rank 1st in C-MTEB benchmark 为这个句子生成表示以用于检索相关文章:
BAAI/bge-large-zh-noinstruct Chinese This model is trained without instruction, and rank 2nd in C-MTEB benchmark  
BAAI/bge-base-zh Chinese a base-scale model but has similar ability with bge-large-zh 为这个句子生成表示以用于检索相关文章:
BAAI/bge-small-zh Chinese a small-scale model but with competitive performance 为这个句子生成表示以用于检索相关文章:

Download Copy S3 URI

How to use

                
document = DocumentAssembler()\ 
    .setInputCol("text")\ 
    .setOutputCol("document")

tokenizer = Tokenizer()\ 
    .setInputCols(["document"])\ 
    .setOutputCol("token") 

embeddings = BertEmbeddings.pretrained("bge_small", "en")\ 
    .setInputCols(["document", "token"])\ 
    .setOutputCol("embeddings")


val document = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer() 
    .setInputCols("document") 
    .setOutputCol("token")
    
val embeddings = BertEmbeddings.pretrained("bge_small", "en")
    .setInputCols("document", "token")
    .setOutputCol("embeddings")

Model Information

Model Name: bge_small
Compatibility: Spark NLP 5.0.2+
License: Open Source
Edition: Official
Input Labels: [document, token]
Output Labels: [embeddings]
Language: en
Size: 79.9 MB
Case sensitive: true

References

BAAI models are from BAAI