Description
PaliGemma 2 is an update of the PaliGemma vision-language model (VLM) which incorporates the capabilities of the Gemma 2 models. The PaliGemma family of models is inspired by PaLI-3 and based on open components such as the SigLIP vision model and Gemma 2 language models. It takes both image and text as input and generates text as output, supporting multiple languages. It is designed for class-leading fine-tune performance on a wide range of vision-language tasks such as image and short video caption, visual question answering, text reading, object detection and object segmentation.
Predicted Entities
How to use
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline
from pyspark.sql.functions import lit
image_df = spark.read.format("image").load(path=images_path) # Replace with your image path
test_df = image_df.withColumn("text", lit("USER: \n <image> \nDescribe this image. \nASSISTANT:\n"))
imageAssembler = ImageAssembler()
.setInputCol("image")
.setOutputCol("image_assembler")
visualQAClassifier = PaliGemmaForMultiModal.pretrained()
.setInputCols("image_assembler")
.setOutputCol("answer")
pipeline = Pipeline().setStages([
imageAssembler,
visualQAClassifier
])
result = pipeline.fit(test_df).transform(test_df)
result.select("image_assembler.origin", "answer.result").show(False)
import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.lit
val imageFolder = "path/to/your/images" // Replace with your image path
val imageDF: DataFrame = spark.read
.format("image")
.option("dropInvalid", value = true)
.load(imageFolder)
val testDF: DataFrame = imageDF.withColumn("text", lit("USER: \n <image> \nDescribe this image. \nASSISTANT:\n"))
val imageAssembler: ImageAssembler = new ImageAssembler()
.setInputCol("image")
.setOutputCol("image_assembler")
val visualQAClassifier = PaliGemmaForMultiModal.pretrained()
.setInputCols("image_assembler")
.setOutputCol("answer")
val pipeline = new Pipeline().setStages(Array(
imageAssembler,
visualQAClassifier
))
val result = pipeline.fit(testDF).transform(testDF)
result.select("image_assembler.origin", "answer.result").show(false)
Model Information
Model Name: | paligemma_3b_pt_224_int4 |
Compatibility: | Spark NLP 5.5.1+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [image_assembler] |
Output Labels: | [answer] |
Language: | en |
Size: | 3.1 GB |
PREVIOUSPaliGemma 3B ft vqav2 int4