Description
SmolVLM is a compact open multimodal model that accepts arbitrary sequences of image and text inputs to produce text outputs. Designed for efficiency, SmolVLM can answer questions about images, describe visual content, create stories grounded on multiple images, or function as a pure language model without visual inputs. Its lightweight architecture makes it suitable for on-device applications while maintaining strong performance on multimodal tasks.
Predicted Entities
How to use
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline
from pyspark.sql.functions import lit
image_df = spark.read.format("image").load(path=images_path) # Replace with your image path
test_df = image_df.withColumn(
"text",
lit("<|im_start|>User:<image>Can you describe the image?<end_of_utterance>\nAssistant:")
)
imageAssembler = ImageAssembler() \\
.setInputCol("image") \\
.setOutputCol("image_assembler")
visualQAClassifier = SmolVLMTransformer.pretrained() \\
.setInputCols("image_assembler") \\
.setOutputCol("answer")
pipeline = Pipeline().setStages([
imageAssembler,
visualQAClassifier
])
result = pipeline.fit(test_df).transform(test_df)
result.select("image_assembler.origin", "answer.result").show(truncate=False)
import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.lit
val imageDF: DataFrame = spark.read
.format("image")
.option("dropInvalid", value = true)
.load(imageFolder) // Replace with your image folder
val testDF: DataFrame = imageDF.withColumn("text", lit("<|im_start|>User:<image>Can you describe the image?<end_of_utterance>\nAssistant:"))
val imageAssembler: ImageAssembler = new ImageAssembler()
.setInputCol("image")
.setOutputCol("image_assembler")
val visualQAClassifier = SmolVLMTransformer.pretrained()
.setInputCols("image_assembler")
.setOutputCol("answer")
val pipeline = new Pipeline().setStages(Array(
imageAssembler,
visualQAClassifier
))
val result = pipeline.fit(testDF).transform(testDF)
result.select("image_assembler.origin", "answer.result").show(truncate=false)
Model Information
Model Name: | smolvlm_instruct_int4 |
Compatibility: | Spark NLP 5.5.1+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [image_assembler] |
Output Labels: | [answer] |
Language: | en |
Size: | 1.8 GB |
References
https://huggingface.co/HuggingFaceTB/SmolVLM-Instruct