Description
The Phi-3-Vision-128K-Instruct is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.
Predicted Entities
How to use
url1 = "https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/d5fbbd1a-d484-415c-88cb-9986625b7b11"
url2 = "http://images.cocodataset.org/val2017/000000039769.jpg"
Path("images").mkdir(exist_ok=True)
!wget -q -O images/image1.jpg {url1}
!wget -q -O images/image2.jpg {url2}
images_path = "file://" + os.getcwd() + "/images/"
image_df = spark.read.format("image").load(
path=images_path
)
test_df = image_df.withColumn("text", lit("<|user|> \n <|image_1|> \n What's this picture about? <|end|>\n <|assistant|>\n"))
image_assembler = ImageAssembler().setInputCol("image").setOutputCol("image_assembler")
imageClassifier = Phi3Vision.pretrained("phi_3_vision_128k_instruct","en")\
.setMaxOutputLength(50) \
.setInputCols("image_assembler") \
.setOutputCol("answer")
pipeline = Pipeline(
stages=[
image_assembler,
imageClassifier,
]
)
results = pipeline.fit(test_df).transform(test_df)
val imageFolder = "src/test/resources/image/"
val imageDF: DataFrame = ResourceHelper.spark.read
.format("image")
.option("dropInvalid", value = true)
.load(imageFolder)
val testDF: DataFrame = imageDF.withColumn(
"text",
lit("<|user|> \n <|image_1|> \n What's this picture about? <|end|>\n <|assistant|>\n"))
val imageAssembler: ImageAssembler = new ImageAssembler()
.setInputCol("image")
.setOutputCol("image_assembler")
val loadModel = Phi3Vision
.pretrained("phi_3_vision_128k_instruct","en")
.setInputCols("image_assembler")
.setOutputCol("answer")
.setMaxOutputLength(50)
val newPipeline: Pipeline =
new Pipeline().setStages(Array(imageAssembler, loadModel))
newPipeline.fit(testDF).transform(testDF).show()
Model Information
Model Name: | phi_3_vision_128k_instruct |
Compatibility: | Spark NLP 5.5.1+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [image_assembler] |
Output Labels: | [answer] |
Language: | en |
Size: | 3.3 GB |
References
https://huggingface.co/microsoft/Phi-3-vision-128k-instruct