Description
The model was pre-trained using T5’s denoising objective on C4, subsequently additionally pre-trained using REALM’s salient span masking objective on Wikipedia, and finally fine-tuned on Natural Questions (NQ).
Note: The model was fine-tuned on 100% of the train splits of Natural Questions (NQ) for 10k steps.
Other community Checkpoints: here
Paper: How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Open in Colab Download Copy S3 URI
How to use
Either set the following tasks or have them inline with your input:
- nq question:
- trivia question:
- question:
- nq:
document_assembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("documents")
t5 = T5Transformer() \
.pretrained("google_t5_small_ssm_nq") \
.setTask("nq:")\
.setMaxOutputLength(200)\
.setInputCols(["documents"]) \
.setOutputCol("answer")
pipeline = Pipeline().setStages([document_assembler, t5])
results = pipeline.fit(data_df).transform(data_df)
results.select("answer.result").show(truncate=False)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("documents")
val t5 = T5Transformer
.pretrained("google_t5_small_ssm_nq")
.setTask("nq:")
.setInputCols(Array("documents"))
.setOutputCol("answer")
val pipeline = new Pipeline().setStages(Array(documentAssembler, t5))
val model = pipeline.fit(dataDf)
val results = model.transform(dataDf)
results.select("answer.result").show(truncate = false)
import nlu
nlu.load("en.t5").predict("""Put your text here.""")
Model Information
| Model Name: | google_t5_small_ssm_nq | 
| Compatibility: | Spark NLP 2.7.1+ | 
| Edition: | Official | 
| Input Labels: | [sentence] | 
| Output Labels: | [t5] | 
| Language: | en | 
Data Source
The model was pre-trained using T5’s denoising objective on C4, subsequently additionally pre-trained using REALM’s salient span masking objective on Wikipedia, and finally fine-tuned on Natural Questions (NQ).