Google's T5 for closed book question answering

Description

This is a text-to-text model trained by Google on the colossal, cleaned version of Common Crawl’s web crawl corpus (C4) data set and then fined tuned on Wikipedia and the natural questions (NQ) dataset. The model can answer free text questions, such as “Which is the capital of France ?” without relying on any context or external resources.

Predicted Entities

Download Copy S3 URI

How to use

from sparknlp.annotator import SentenceDetectorDLModel, T5Transformer

data = self.spark.createDataFrame([
[1, "Which is the capital of France? Who was the first president of USA?"],
[1, "Which is the capital of Bulgaria ?"],
[2, "Who is Donald Trump?"]]).toDF("id", "text")

document_assembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("documents")

sentence_detector = SentenceDetectorDLModel\
.pretrained()\
.setInputCols(["documents"])\
.setOutputCol("questions")

t5 = T5Transformer()\
.pretrained("google_t5_small_ssm_nq")\
.setInputCols(["questions"])\
.setOutputCol("answers")\

pipeline = Pipeline().setStages([document_assembler, sentence_detector, t5])
results = pipeline.fit(data).transform(data)

results.select("questions.result", "answers.result").show(truncate=False)

val testData = ResourceHelper.spark.createDataFrame(Seq(

(1, "Which is the capital of France? Who was the first president of USA?"),
(1, "Which is the capital of Bulgaria ?"),
(2, "Who is Donald Trump?")

)).toDF("id", "text")

val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("documents")

val sentenceDetector = SentenceDetectorDLModel
.pretrained()
.setInputCols(Array("documents"))
.setOutputCol("questions")

val t5 = T5Transformer
.pretrained("google_t5_small_ssm_nq")
.setInputCols(Array("questions"))
.setOutputCol("answers")

val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, t5))

val model = pipeline.fit(testData)
val results = model.transform(testData)

results.select("questions.result", "answers.result").show(truncate = false)

import nlu
nlu.load("en.t5").predict("""Which is the capital of France? Who was the first president of USA?""")

Results

+-------------------------------------------------------------------------------------------------------------+-----------------------------------------+
|result                                                                                                                 |result                                     |
+-------------------------------------------------------------------------------------------------------------+-----------------------------------------+
|[Which is the capital of France?, Who was the first president of USA?]|[Paris, George Washington]|
|[Which is the capital of Bulgaria ?]                                                              |[Sofia]                                     |
|[Who is Donald Trump?]                                                                                |[a United States citizen]      |
+------------------------------------------------------------------------------------------------------------+------------------------------------------+

Model Information

Model Name:	google_t5_small_ssm_nq
Compatibility:	Spark NLP 4.0.0+
License:	Open Source
Edition:	Official
Input Labels:	[documents]
Output Labels:	[t5]
Language:	en
Size:	179.1 MB

References

C4, Wikipedia, NQ

PREVIOUSItalian CamemBert Embeddings (from Musixmatch)

NEXTT5 for Active to Passive Style Transfer