TREC(6) Question Classifier

Description

Classify open-domain, fact-based questions into one of the following broad semantic categories: Abbreviation, Description, Entities, Human Beings, Locations, or Numeric Values.

Predicted Entities

ABBR, DESC, NUM, ENTY, LOC, HUM.

Live Demo Open in Colab Download Copy S3 URI

How to use

documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")

use = UniversalSentenceEncoder.pretrained(lang="en") \
.setInputCols(["document"])\
.setOutputCol("sentence_embeddings")

document_classifier = ClassifierDLModel.pretrained('classifierdl_use_trec6', 'en') \
.setInputCols(["document", "sentence_embeddings"]) \
.setOutputCol("class")

nlpPipeline = Pipeline(stages=[documentAssembler, use, document_classifier])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

annotations = light_pipeline.fullAnnotate('When did the construction of stone circles begin in the UK?')
val documentAssembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val use = UniversalSentenceEncoder.pretrained(lang="en")
.setInputCols(Array("document"))
.setOutputCol("sentence_embeddings")

val document_classifier = ClassifierDLModel.pretrained("classifierdl_use_trec6", "en")
.setInputCols(Array("document", "sentence_embeddings"))
.setOutputCol("class")

val pipeline = new Pipeline().setStages(Array(documentAssembler, use, document_classifier))

val data = Seq("When did the construction of stone circles begin in the UK?").toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu

text = ["""When did the construction of stone circles begin in the UK?"""]
trec6_df = nlu.load('en.classify.trec6.use').predict(text, output_level='document')
trec6_df[["document", "trec6"]]

Results

+------------------------------------------------------------------------------------------------+------------+
|document                                                                                        |class       |
+------------------------------------------------------------------------------------------------+------------+
|When did the construction of stone circles begin in the UK?                                     | NUM        |
+------------------------------------------------------------------------------------------------+------------+

Model Information

Model Name: classifierdl_use_trec6
Compatibility: Spark NLP 2.7.1+
License: Open Source
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [class]
Language: en

Benchmarking

precision    recall  f1-score   support

ABBR       0.00      0.00      0.00        26
DESC       0.89      0.96      0.92       343
ENTY       0.86      0.86      0.86       391
HUM       0.91      0.90      0.91       366
LOC       0.88      0.91      0.89       233
NUM       0.94      0.94      0.94       274

accuracy                           0.89      1633
macro avg       0.75      0.76      0.75      1633
weighted avg       0.88      0.89      0.89      1633

Data Source

This model is trained on the 50 class version of the TREC dataset. http://search.r-project.org/library/textdata/html/dataset_trec.html