Description
This model is a Text Generation model, originally trained on SQUAD dataset, then finetuned by AllenAI team, to generate questions from texts. The power lies on the ability to generate also questions providing a low number of tokens, for example a subject and a verb (Amazon
should provide
), what would return a question similar to What Amazon should provide?
).
At the same time, this model can be used to feed Question Answering Models, as the first parameter (question), while providing a bigger paragraph as context. This way, you:
- First, generate questions on the fly
- Second, look for an answer in the text.
Moreover, the input of this model can even be a concatenation of entities from NER (EMV
- ORG , will provide
- ACTION).
Predicted Entities
How to use
document_assembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("documents")
t5 = T5Transformer() \
.pretrained("t5_question_generation_small") \
.setTask("")\
.setMaxOutputLength(200)\
.setInputCols(["documents"]) \
.setOutputCol("question")
data_df = spark.createDataFrame([["EMV will pay"]]).toDF("text")
pipeline = Pipeline().setStages([document_assembler, t5])
results = pipeline.fit(data_df).transform(data_df)
results.select("question.result").show(truncate=False)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("documents")
val t5 = T5Transformer.pretrained("t5_question_generation_small")
.setTask("")
.setMaxOutputLength(200)
.setInputCols("documents")
.setOutputCol("question")
val pipeline = new Pipeline().setStages(Array(documentAssembler, t5))
val data = Seq("EMV will pay").toDF("text")
val result = pipeline.fit(data).transform(data)
result.select("question.result").show(false)
import nlu
nlu.load("en.t5.small.generation").predict("""EMV will pay""")
Results
+--------------------+
|result |
+--------------------+
|[What will EMV pay?]|
+--------------------+
Model Information
Model Name: | t5_question_generation_small |
Compatibility: | Spark NLP 4.0.0+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [documents] |
Output Labels: | [summaries] |
Language: | en |
Size: | 148.0 MB |
References
SQUAD2.0