TREC(6) Question Classifier

Description

Classify open-domain, fact-based questions into one of the following broad semantic categories: Abbreviation, Description, Entities, Human Beings, Locations or Numeric Values.

Predicted Entities

ABBR, DESC, NUM, ENTY, LOC, HUM.

Live Demo
Open in Colab
Download Copy S3 URI

How to use

documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
use = UniversalSentenceEncoder.pretrained(lang="en") \
.setInputCols(["document"])\
.setOutputCol("sentence_embeddings")
document_classifier = ClassifierDLModel.pretrained('classifierdl_use_trec6', 'en') \
.setInputCols(["document", "sentence_embeddings"]) \
.setOutputCol("class")

nlpPipeline = Pipeline(stages=[documentAssembler, use, document_classifier])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

annotations = light_pipeline.fullAnnotate('When did the construction of stone circles begin in the UK?')

val documentAssembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val use = UniversalSentenceEncoder.pretrained(lang="en")
.setInputCols(Array("document"))
.setOutputCol("sentence_embeddings")
val document_classifier = ClassifierDLModel.pretrained("classifierdl_use_trec6", "en")
.setInputCols(Array("document", "sentence_embeddings"))
.setOutputCol("class")
val pipeline = new Pipeline().setStages(Array(documentAssembler, use, document_classifier))

val data = Seq("When did the construction of stone circles begin in the UK?").toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu

text = ["""When did the construction of stone circles begin in the UK?"""]
trec6_df = nlu.load('en.classify.trec6.use').predict(text, output_level='document')
trec6_df[["document", "trec6"]]

Results

+------------------------------------------------------------------------------------------------+------------+
|document                                                                                        |class       |
+------------------------------------------------------------------------------------------------+------------+
|When did the construction of stone circles begin in the UK?                                     | NUM        |
+------------------------------------------------------------------------------------------------+------------+

Model Information

|————————-|————————————–| | Model Name | classifierdl_use_trec6 | | Model Class | ClassifierDLModel | | Spark Compatibility | 2.5.0 | | Spark NLP Compatibility | 2.4 | | License | open source | | Edition | public | | Input Labels | [document, sentence_embeddings] | | Output Labels | [class] | | Language | en | | Upstream Dependencies | tfhub_use |

Data Source

This model is trained on the 6 class version of TREC dataset. http://search.r-project.org/library/textdata/html/dataset_trec.html

Benchmarking

precision    recall  f1-score   support

ABBR       0.00      0.00      0.00        26
DESC       0.89      0.96      0.92       343
ENTY       0.86      0.86      0.86       391
HUM       0.91      0.90      0.91       366
LOC       0.88      0.91      0.89       233
NUM       0.94      0.94      0.94       274

accuracy                           0.89      1633
macro avg       0.75      0.76      0.75      1633
weighted avg       0.88      0.89      0.89      1633