Instructor Base Sentence Embeddings

Description

Instructor👨‍🏫, an instruction-finetuned text embedding model that can generate text embeddings tailored to any task (e.g., classification, retrieval, clustering, text evaluation, etc.) and domains (e.g., science, finance, etc.) by simply providing the task instruction, without any finetuning. Instructor👨‍ achieves sota on 70 diverse embedding tasks.

Download Copy S3 URI

How to use

instruction = InstructorEmbeddings.pretrained("instructor_base","en") \
            .setInstruction("Instruction here: ") \
            .setInputCols(["documents"]) \
            .setOutputCol("instructor")

        pipeline = Pipeline().setStages([document_assembler, instruction])

    val embeddings = InstructorEmbeddings
      .pretrained("instructor_base","en")
      .setInstruction("Instruction here: ")
      .setInputCols(Array("document"))
      .setOutputCol("instructor")

    val pipeline = new Pipeline().setStages(Array(document, embeddings))

Model Information

Model Name:	instructor_base
Compatibility:	Spark NLP 5.4.2+
License:	Open Source
Edition:	Official
Input Labels:	[document]
Output Labels:	[instructor]
Language:	en
Size:	406.0 MB

PREVIOUSEnglish happy_transformer_t5_base_grammar_correction_ep_v1_pipeline pipeline T5Transformer from hafidikhsan

NEXTEnglish japanese_flan_instruction_1500000 T5Transformer from shiontendon