Description
OLMo is a series of Open Language Models designed to enable the science of language models. The OLMo models are trained on the Dolma dataset. We release all code, checkpoints, logs (coming soon), and details involved in training these models. This model has been converted from allenai/OLMo-1B for the Hugging Face Transformers format.
Predicted Entities
How to use
data = spark.createDataFrame([
[1, "My name is Leo, "]]).toDF("id", "text")
document_assembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("documents")
olmor_loaded = OLMoTransformer \
.pretrained("olmo_1b_int4","en") \
.setMaxOutputLength(50) \
.setDoSample(False) \
.setInputCols(["documents"]) \
.setOutputCol("generation")
pipeline = Pipeline().setStages([document_assembler, olmor_loaded])
results = pipeline.fit(data).transform(data)
results.select("generation.result").show(truncate=False)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val seq2seq = OLMoTransformer.pretrained("olmo_1b_int4","en")
.setInputCols(Array("document"))
.setOutputCol("generation")
val pipeline = new Pipeline().setStages(Array(documentAssembler, seq2seq))
val data = Seq(""My name is Leo,").toDF("text")
val pipelineModel = pipeline.fit(data)
val pipelineDF = pipelineModel.transform(data)
Model Information
Model Name: | olmo_1b_int4 |
Compatibility: | Spark NLP 5.5.1+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [documents] |
Output Labels: | [generation] |
Language: | en |
Size: | 1.1 GB |
References
https://huggingface.co/allenai/OLMo-1B-hf