LLaVA v1.5 7B Q4 GGUF

Description

LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.

Originally from https://huggingface.co/Mozilla/llava-v1.5-7b-llamafile

Predicted Entities

Download Copy S3 URI

How to use

import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline
from pyspark.sql.functions import lit

documentAssembler = DocumentAssembler() \
    .setInputCol("caption") \
    .setOutputCol("caption_document")
imageAssembler = ImageAssembler() \
    .setInputCol("image") \
    .setOutputCol("image_assembler")

imagesPath = "src/test/resources/image/"
data = ImageAssembler \
    .loadImagesAsBytes(spark, imagesPath) \
    .withColumn("caption", lit("Caption this image.")) # Add a caption to each image.

nPredict = 40
model = AutoGGUFVisionModel.pretrained() \
    .setInputCols(["caption_document", "image_assembler"]) \
    .setOutputCol("completions") \
    .setBatchSize(4) \
    .setNGpuLayers(99) \
    .setNCtx(4096) \
    .setMinKeep(0) \
    .setMinP(0.05) \
    .setNPredict(nPredict) \
    .setNProbs(0) \
    .setPenalizeNl(False) \
    .setRepeatLastN(256) \
    .setRepeatPenalty(1.18) \
    .setStopStrings(["</s>", "Llama:", "User:"]) \
    .setTemperature(0.05) \
    .setTfsZ(1) \
    .setTypicalP(1) \
    .setTopK(40) \
    .setTopP(0.95)

pipeline = Pipeline().setStages([documentAssembler, imageAssembler, model])
pipeline.fit(data).transform(data) \
    .selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "completions.result") \
    .show(truncate = False)

import com.johnsnowlabs.nlp.ImageAssembler
import com.johnsnowlabs.nlp.annotator._
import com.johnsnowlabs.nlp.base._
import org.apache.spark.ml.Pipeline
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.lit

val documentAssembler = new DocumentAssembler()
  .setInputCol("caption")
  .setOutputCol("caption_document")

val imageAssembler = new ImageAssembler()
  .setInputCol("image")
  .setOutputCol("image_assembler")

val imagesPath = "src/test/resources/image/"
val data: DataFrame = ImageAssembler
  .loadImagesAsBytes(ResourceHelper.spark, imagesPath)
  .withColumn("caption", lit("Caption this image.")) // Add a caption to each image.

val nPredict = 40
val model = AutoGGUFVisionModel.pretrained()
  .setInputCols("caption_document", "image_assembler")
  .setOutputCol("completions")
  .setBatchSize(4)
  .setNGpuLayers(99)
  .setNCtx(4096)
  .setMinKeep(0)
  .setMinP(0.05f)
  .setNPredict(nPredict)
  .setNProbs(0)
  .setPenalizeNl(false)
  .setRepeatLastN(256)
  .setRepeatPenalty(1.18f)
  .setStopStrings(Array("</s>", "Llama:", "User:"))
  .setTemperature(0.05f)
  .setTfsZ(1)
  .setTypicalP(1)
  .setTopK(40)
  .setTopP(0.95f)

val pipeline = new Pipeline().setStages(Array(documentAssembler, imageAssembler, model))
pipeline
  .fit(data)
  .transform(data)
  .selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "completions.result")
  .show(truncate = false)

Results

+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|image_name       |result                                                                                                                                                                                        |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|palace.JPEG      |[ The image depicts a large, ornate room with high ceilings and beautifully decorated walls. There are several chairs placed throughout the space, some of which have cushions]               |
|egyptian_cat.jpeg|[ The image features two cats lying on a pink surface, possibly a bed or sofa. One cat is positioned towards the left side of the scene and appears to be sleeping while holding]             |
|hippopotamus.JPEG|[ A large brown hippo is swimming in a body of water, possibly an aquarium. The hippo appears to be enjoying its time in the water and seems relaxed as it floats]                            |
|hen.JPEG         |[ The image features a large chicken standing next to several baby chickens. In total, there are five birds in the scene: one adult and four young ones. They appear to be gathered together] |
|ostrich.JPEG     |[ The image features a large, long-necked bird standing in the grass. It appears to be an ostrich or similar species with its head held high and looking around. In addition to]              |
|junco.JPEG       |[ A small bird with a black head and white chest is standing on the snow. It appears to be looking at something, possibly food or another animal in its vicinity. The scene takes place out]  |
|bluetick.jpg     |[ A dog with a red collar is sitting on the floor, looking at something. The dog appears to be staring into the distance or focusing its attention on an object in front of it.]              |
|chihuahua.jpg    |[ A small brown dog wearing a sweater is sitting on the floor. The dog appears to be looking at something, possibly its owner or another animal in the room. It seems comfortable and relaxed]|
|tractor.JPEG     |[ A man is sitting in the driver's seat of a green tractor, which has yellow wheels and tires. The tractor appears to be parked on top of an empty field with]                                |
|ox.JPEG          |[ A large bull with horns is standing in a grassy field.]                                                                                                                                     |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Model Information

Model Name:	llava_v1.5_7b_Q4_0_gguf
Compatibility:	Spark NLP 6.0.0+
License:	Open Source
Edition:	Official
Input Labels:	[caption_document, image_assembler]
Output Labels:	[completions]
Language:	en
Size:	4.2 GB

PREVIOUSbge_small_en_v1.5 model from BAAI

NEXTEnglish 080524_15ep_02 RoBertaForZeroShotClassification from adriansanz