Description
Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model belongs to the Phi-4 model family and supports 128K token context length. The model underwent an enhancement process, incorporating both supervised fine-tuning and direct preference optimization to support precise instruction adherence and robust safety measures.
Original model from https://huggingface.co/microsoft/Phi-4-mini-instruct
How to use
from sparknlp.base import DocumentAssembler
from sparknlp.annotator import AutoGGUFModel
from pyspark.ml import Pipeline
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")
auto_gguf_model = AutoGGUFModel.pretrained("phi_4_mini_instruct_q4_k_m_gguf", "en") \
    .setInputCols(["document"]) \
    .setOutputCol("completions") \
    .setBatchSize(4) \
    .setNPredict(20) \
    .setNGpuLayers(99) \
    .setTemperature(0.4) \
    .setTopK(40) \
    .setTopP(0.9) \
    .setPenalizeNl(True)
pipeline = Pipeline().setStages([
    document_assembler,
    auto_gguf_model
])
data = spark.createDataFrame([
    ["The moon is "]
]).toDF("text")
model = pipeline.fit(data)
result = model.transform(data)
result.select("completions").show(truncate=False)
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotators.auto.gguf.AutoGGUFModel
import org.apache.spark.ml.Pipeline
val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")
val autoGGUFModel = AutoGGUFModel.pretrained("phi_4_mini_instruct_q4_k_m_gguf", "en")
  .setInputCols("document")
  .setOutputCol("completions")
  .setBatchSize(4)
  .setNPredict(20)
  .setNGpuLayers(99)
  .setTemperature(0.4f)
  .setTopK(40)
  .setTopP(0.9f)
  .setPenalizeNl(true)
val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  autoGGUFModel
))
val data = Seq("The moon is ").toDF("text")
val model = pipeline.fit(data)
val result = model.transform(data)
result.select("completions").show(false)
Results
The moon orbits Earth and is our closest natural satellite. It's about 384,400 kilometers away,
Model Information
| Model Name: | phi_4_mini_instruct_q4_k_m_gguf | 
| Compatibility: | Spark NLP 6.0.3+ | 
| License: | Open Source | 
| Edition: | Official | 
| Input Labels: | [document] | 
| Output Labels: | [completions] | 
| Language: | en | 
| Size: | 2.5 GB |