OLMo 1B

Description

OLMo is a series of Open Language Models designed to enable the science of language models. The OLMo models are trained on the Dolma dataset. We release all code, checkpoints, logs (coming soon), and details involved in training these models. This model has been converted from allenai/OLMo-1B for the Hugging Face Transformers format.

Predicted Entities

Download Copy S3 URI

How to use

data = spark.createDataFrame([
            [1, "My name is Leo, "]]).toDF("id", "text")
document_assembler = DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("documents")

olmor_loaded = OLMoTransformer \
    .pretrained("olmo_1b_int4","en") \
    .setMaxOutputLength(50) \
    .setDoSample(False) \
    .setInputCols(["documents"]) \
    .setOutputCol("generation")

pipeline = Pipeline().setStages([document_assembler, olmor_loaded])
results = pipeline.fit(data).transform(data)

results.select("generation.result").show(truncate=False)
val documentAssembler = new DocumentAssembler() 
    .setInputCol("text") 
    .setOutputCol("document")

val seq2seq = OLMoTransformer.pretrained("olmo_1b_int4","en") 
    .setInputCols(Array("document")) 
    .setOutputCol("generation")

val pipeline = new Pipeline().setStages(Array(documentAssembler, seq2seq))
val data = Seq(""My name is Leo,").toDF("text")
val pipelineModel = pipeline.fit(data)
val pipelineDF = pipelineModel.transform(data)

Model Information

Model Name: olmo_1b_int4
Compatibility: Spark NLP 5.5.1+
License: Open Source
Edition: Official
Input Labels: [documents]
Output Labels: [generation]
Language: en
Size: 1.1 GB

References

https://huggingface.co/allenai/OLMo-1B-hf