Description
LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.
Originally from https://huggingface.co/Mozilla/llava-v1.5-7b-llamafile
Predicted Entities
How to use
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline
from pyspark.sql.functions import lit
documentAssembler = DocumentAssembler() \
.setInputCol("caption") \
.setOutputCol("caption_document")
imageAssembler = ImageAssembler() \
.setInputCol("image") \
.setOutputCol("image_assembler")
imagesPath = "src/test/resources/image/"
data = ImageAssembler \
.loadImagesAsBytes(spark, imagesPath) \
.withColumn("caption", lit("Caption this image.")) # Add a caption to each image.
nPredict = 40
model = AutoGGUFVisionModel.pretrained() \
.setInputCols(["caption_document", "image_assembler"]) \
.setOutputCol("completions") \
.setBatchSize(4) \
.setNGpuLayers(99) \
.setNCtx(4096) \
.setMinKeep(0) \
.setMinP(0.05) \
.setNPredict(nPredict) \
.setNProbs(0) \
.setPenalizeNl(False) \
.setRepeatLastN(256) \
.setRepeatPenalty(1.18) \
.setStopStrings(["</s>", "Llama:", "User:"]) \
.setTemperature(0.05) \
.setTfsZ(1) \
.setTypicalP(1) \
.setTopK(40) \
.setTopP(0.95)
pipeline = Pipeline().setStages([documentAssembler, imageAssembler, model])
pipeline.fit(data).transform(data) \
.selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "completions.result") \
.show(truncate = False)
import com.johnsnowlabs.nlp.ImageAssembler
import com.johnsnowlabs.nlp.annotator._
import com.johnsnowlabs.nlp.base._
import org.apache.spark.ml.Pipeline
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.lit
val documentAssembler = new DocumentAssembler()
.setInputCol("caption")
.setOutputCol("caption_document")
val imageAssembler = new ImageAssembler()
.setInputCol("image")
.setOutputCol("image_assembler")
val imagesPath = "src/test/resources/image/"
val data: DataFrame = ImageAssembler
.loadImagesAsBytes(ResourceHelper.spark, imagesPath)
.withColumn("caption", lit("Caption this image.")) // Add a caption to each image.
val nPredict = 40
val model = AutoGGUFVisionModel.pretrained()
.setInputCols("caption_document", "image_assembler")
.setOutputCol("completions")
.setBatchSize(4)
.setNGpuLayers(99)
.setNCtx(4096)
.setMinKeep(0)
.setMinP(0.05f)
.setNPredict(nPredict)
.setNProbs(0)
.setPenalizeNl(false)
.setRepeatLastN(256)
.setRepeatPenalty(1.18f)
.setStopStrings(Array("</s>", "Llama:", "User:"))
.setTemperature(0.05f)
.setTfsZ(1)
.setTypicalP(1)
.setTopK(40)
.setTopP(0.95f)
val pipeline = new Pipeline().setStages(Array(documentAssembler, imageAssembler, model))
pipeline
.fit(data)
.transform(data)
.selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "completions.result")
.show(truncate = false)
Results
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|image_name |result |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|palace.JPEG |[ The image depicts a large, ornate room with high ceilings and beautifully decorated walls. There are several chairs placed throughout the space, some of which have cushions] |
|egyptian_cat.jpeg|[ The image features two cats lying on a pink surface, possibly a bed or sofa. One cat is positioned towards the left side of the scene and appears to be sleeping while holding] |
|hippopotamus.JPEG|[ A large brown hippo is swimming in a body of water, possibly an aquarium. The hippo appears to be enjoying its time in the water and seems relaxed as it floats] |
|hen.JPEG |[ The image features a large chicken standing next to several baby chickens. In total, there are five birds in the scene: one adult and four young ones. They appear to be gathered together] |
|ostrich.JPEG |[ The image features a large, long-necked bird standing in the grass. It appears to be an ostrich or similar species with its head held high and looking around. In addition to] |
|junco.JPEG |[ A small bird with a black head and white chest is standing on the snow. It appears to be looking at something, possibly food or another animal in its vicinity. The scene takes place out] |
|bluetick.jpg |[ A dog with a red collar is sitting on the floor, looking at something. The dog appears to be staring into the distance or focusing its attention on an object in front of it.] |
|chihuahua.jpg |[ A small brown dog wearing a sweater is sitting on the floor. The dog appears to be looking at something, possibly its owner or another animal in the room. It seems comfortable and relaxed]|
|tractor.JPEG |[ A man is sitting in the driver's seat of a green tractor, which has yellow wheels and tires. The tractor appears to be parked on top of an empty field with] |
|ox.JPEG |[ A large bull with horns is standing in a grassy field.] |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Model Information
Model Name: | llava_v1.5_7b_Q4_0_gguf |
Compatibility: | Spark NLP 6.0.0+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [caption_document, image_assembler] |
Output Labels: | [completions] |
Language: | en |
Size: | 4.2 GB |