GPT2 text-to-text model (Base)

Description

“GPT-2 displays a broad set of capabilities, including the ability to generate conditional synthetic text samples of unprecedented quality, where the model is primed with an input and it generates a lengthy continuation.

Predicted Entities

Download Copy S3 URI

How to use

documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("documents")

gpt2 = GPT2Transformer.pretrained("gpt2") \
.setInputCols(["documents"]) \
.setMaxOutputLength(50) \
.setOutputCol("generation")

pipeline = Pipeline().setStages([documentAssembler, gpt2])
data = spark.createDataFrame([["My name is Leonardo."]]).toDF("text")
result = pipeline.fit(data).transform(data)
result.select("summaries.generation").show(truncate=False)

val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("documents")

val gpt2 = GPT2Transformer.pretrained("gpt2")
.setInputCols(Array("documents"))
.setMinOutputLength(10)
.setMaxOutputLength(50)
.setDoSample(false)
.setTopK(50)
.setNoRepeatNgramSize(3)
.setOutputCol("generation")

val pipeline = new Pipeline().setStages(Array(documentAssembler, gpt2))

val data = Seq("My name is Leonardo.").toDF("text")
val result = pipeline.fit(data).transform(data)
results.select("generation.result").show(truncate = false)

Model Information

Model Name:	gpt2
Compatibility:	Spark NLP 5.5.0+
License:	Open Source
Edition:	Official
Input Labels:	[documents]
Output Labels:	[generation]
Language:	en
Size:	467.4 MB

References

https://huggingface.co/openai-community/gpt2

PREVIOUSnomic-embed-text-v1.5.Q8_0.gguf

NEXTJapanese hubert_large_japanese_asr HubertForCTC from TKU410410103