Description
This is a text-to-text model trained by Google on the colossal, cleaned version of Common Crawl’s web crawl corpus (C4) data set and then fined tuned on Wikipedia and the natural questions (NQ) dataset. The model can answer free text questions, such as “Which is the capital of France ?” without relying on any context or external resources.
Predicted Entities
How to use
from sparknlp.annotator import SentenceDetectorDLModel, T5Transformer
data = self.spark.createDataFrame([
[1, "Which is the capital of France? Who was the first president of USA?"],
[1, "Which is the capital of Bulgaria ?"],
[2, "Who is Donald Trump?"]]).toDF("id", "text")
document_assembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("documents")
sentence_detector = SentenceDetectorDLModel\
.pretrained()\
.setInputCols(["documents"])\
.setOutputCol("questions")
t5 = T5Transformer()\
.pretrained("google_t5_small_ssm_nq")\
.setInputCols(["questions"])\
.setOutputCol("answers")\
pipeline = Pipeline().setStages([document_assembler, sentence_detector, t5])
results = pipeline.fit(data).transform(data)
results.select("questions.result", "answers.result").show(truncate=False)
val testData = ResourceHelper.spark.createDataFrame(Seq(
(1, "Which is the capital of France? Who was the first president of USA?"),
(1, "Which is the capital of Bulgaria ?"),
(2, "Who is Donald Trump?")
)).toDF("id", "text")
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("documents")
val sentenceDetector = SentenceDetectorDLModel
.pretrained()
.setInputCols(Array("documents"))
.setOutputCol("questions")
val t5 = T5Transformer
.pretrained("google_t5_small_ssm_nq")
.setInputCols(Array("questions"))
.setOutputCol("answers")
val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, t5))
val model = pipeline.fit(testData)
val results = model.transform(testData)
results.select("questions.result", "answers.result").show(truncate = false)
import nlu
nlu.load("en.t5").predict("""Which is the capital of France? Who was the first president of USA?""")
Results
+-------------------------------------------------------------------------------------------------------------+-----------------------------------------+
|result |result |
+-------------------------------------------------------------------------------------------------------------+-----------------------------------------+
|[Which is the capital of France?, Who was the first president of USA?]|[Paris, George Washington]|
|[Which is the capital of Bulgaria ?] |[Sofia] |
|[Who is Donald Trump?] |[a United States citizen] |
+------------------------------------------------------------------------------------------------------------+------------------------------------------+
Model Information
Model Name: | google_t5_small_ssm_nq |
Compatibility: | Spark NLP 4.0.0+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [documents] |
Output Labels: | [t5] |
Language: | en |
Size: | 179.1 MB |
References
C4, Wikipedia, NQ