Description
Instructor👨🏫, an instruction-finetuned text embedding model that can generate text embeddings tailored to any task (e.g., classification, retrieval, clustering, text evaluation, etc.) and domains (e.g., science, finance, etc.) by simply providing the task instruction, without any finetuning. Instructor👨 achieves sota on 70 diverse embedding tasks.
Predicted Entities
How to use
instruction = InstructorEmbeddings.pretrained("instructor_base","en") \
.setInstruction("Instruction here: ") \
.setInputCols(["documents"]) \
.setOutputCol("instructor")
pipeline = Pipeline().setStages([document_assembler, instruction])
val embeddings = InstructorEmbeddings
.pretrained("instructor_base","en")
.setInstruction("Instruction here: ")
.setInputCols(Array("document"))
.setOutputCol("instructor")
val pipeline = new Pipeline().setStages(Array(document, embeddings))
Model Information
Model Name: | instructor_base |
Compatibility: | Spark NLP 5.0.0+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [documents] |
Output Labels: | [instructor] |
Language: | en |
Size: | 406.6 MB |
References
https://huggingface.co/hkunlp/instructor-base