Description
Janus is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus make it a strong candidate for next-generation unified multimodal models.
Predicted Entities
How to use
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline
from pyspark.sql.functions import lit
image_df = spark.read.format("image").load(path=images_path) # Replace with your image path
test_df = image_df.withColumn("text", lit("You are a helpful language and vision assistant. You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language.\n\nUser: <image_placeholder>Describe image in details\n\nAssistant:"))
imageAssembler = ImageAssembler()
.setInputCol("image")
.setOutputCol("image_assembler")
visualQAClassifier = JanusForMultiModal.pretrained()
.setInputCols("image_assembler")
.setOutputCol("answer")
pipeline = Pipeline().setStages([
imageAssembler,
visualQAClassifier
])
result = pipeline.fit(test_df).transform(test_df)
result.select("image_assembler.origin", "answer.result").show(false)
import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.lit
val imageDF: DataFrame = spark.read
.format("image")
.option("dropInvalid", value = true)
.load(imageFolder) // Replace with your image folder
val testDF: DataFrame = imageDF.withColumn("text", lit("You are a helpful language and vision assistant. You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language.\n\nUser: <image_placeholder>Describe image in details\n\nAssistant:"))
val imageAssembler: ImageAssembler = new ImageAssembler()
.setInputCol("image")
.setOutputCol("image_assembler")
val visualQAClassifier = JanusForMultiModal.pretrained()
.setInputCols("image_assembler")
.setOutputCol("answer")
val pipeline = new Pipeline().setStages(Array(
imageAssembler,
visualQAClassifier
))
val result = pipeline.fit(testDF).transform(testDF)
result.select("image_assembler.origin", "answer.result").show(false)
Model Information
Model Name: | janus_1_3b_int4 |
Compatibility: | Spark NLP 5.5.1+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [image_assembler] |
Output Labels: | [answer] |
Language: | en |
Size: | 1.9 GB |
References
https://huggingface.co/deepseek-ai/Janus-1.3B