Instructor Base Sentence Embeddings

Description

Instructor👨‍🏫, an instruction-finetuned text embedding model that can generate text embeddings tailored to any task (e.g., classification, retrieval, clustering, text evaluation, etc.) and domains (e.g., science, finance, etc.) by simply providing the task instruction, without any finetuning. Instructor👨‍ achieves sota on 70 diverse embedding tasks.

Download Copy S3 URI

How to use


instruction = InstructorEmbeddings.pretrained("instructor_base","en") \
            .setInstruction("Instruction here: ") \
            .setInputCols(["documents"]) \
            .setOutputCol("instructor")

        pipeline = Pipeline().setStages([document_assembler, instruction])


    val embeddings = InstructorEmbeddings
      .pretrained("instructor_base","en")
      .setInstruction("Instruction here: ")
      .setInputCols(Array("document"))
      .setOutputCol("instructor")

    val pipeline = new Pipeline().setStages(Array(document, embeddings))


Model Information

Model Name: instructor_base
Compatibility: Spark NLP 5.4.2+
License: Open Source
Edition: Official
Input Labels: [document]
Output Labels: [instructor]
Language: en
Size: 406.0 MB