Instructor Base Sentence Embeddings

Description

Instructor👨‍🏫, an instruction-finetuned text embedding model that can generate text embeddings tailored to any task (e.g., classification, retrieval, clustering, text evaluation, etc.) and domains (e.g., science, finance, etc.) by simply providing the task instruction, without any finetuning. Instructor👨‍ achieves sota on 70 diverse embedding tasks.

Predicted Entities

Download Copy S3 URI

How to use

instruction = InstructorEmbeddings.pretrained("instructor_base","en") \
            .setInstruction("Instruction here: ") \
            .setInputCols(["documents"]) \
            .setOutputCol("instructor")

        pipeline = Pipeline().setStages([document_assembler, instruction])

    val embeddings = InstructorEmbeddings
      .pretrained("instructor_base","en")
      .setInstruction("Instruction here: ")
      .setInputCols(Array("document"))
      .setOutputCol("instructor")

    val pipeline = new Pipeline().setStages(Array(document, embeddings))

Model Information

Model Name:	instructor_base
Compatibility:	Spark NLP 5.0.0+
License:	Open Source
Edition:	Official
Input Labels:	[documents]
Output Labels:	[instructor]
Language:	en
Size:	406.6 MB

References

https://huggingface.co/hkunlp/instructor-base

PREVIOUSEnglish Legal XLM-Longformer Base Embeddings Model

NEXTEnglish Legal Contracts BertEmbeddings model (Base, Uncased)