Description
Classify open-domain, fact-based questions into one of the following broad semantic categories: Abbreviation, Description, Entities, Human Beings, Locations, or Numeric Values.
Predicted Entities
ABBR
, DESC
, NUM
, ENTY
, LOC
, HUM
.
Live Demo Open in Colab Download Copy S3 URI
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
use = UniversalSentenceEncoder.pretrained(lang="en") \
.setInputCols(["document"])\
.setOutputCol("sentence_embeddings")
document_classifier = ClassifierDLModel.pretrained('classifierdl_use_trec6', 'en') \
.setInputCols(["document", "sentence_embeddings"]) \
.setOutputCol("class")
nlpPipeline = Pipeline(stages=[documentAssembler, use, document_classifier])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))
annotations = light_pipeline.fullAnnotate('When did the construction of stone circles begin in the UK?')
val documentAssembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val use = UniversalSentenceEncoder.pretrained(lang="en")
.setInputCols(Array("document"))
.setOutputCol("sentence_embeddings")
val document_classifier = ClassifierDLModel.pretrained("classifierdl_use_trec6", "en")
.setInputCols(Array("document", "sentence_embeddings"))
.setOutputCol("class")
val pipeline = new Pipeline().setStages(Array(documentAssembler, use, document_classifier))
val data = Seq("When did the construction of stone circles begin in the UK?").toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
text = ["""When did the construction of stone circles begin in the UK?"""]
trec6_df = nlu.load('en.classify.trec6.use').predict(text, output_level='document')
trec6_df[["document", "trec6"]]
Results
+------------------------------------------------------------------------------------------------+------------+
|document |class |
+------------------------------------------------------------------------------------------------+------------+
|When did the construction of stone circles begin in the UK? | NUM |
+------------------------------------------------------------------------------------------------+------------+
Model Information
Model Name: | classifierdl_use_trec6 |
Compatibility: | Spark NLP 2.7.1+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [class] |
Language: | en |
Benchmarking
precision recall f1-score support
ABBR 0.00 0.00 0.00 26
DESC 0.89 0.96 0.92 343
ENTY 0.86 0.86 0.86 391
HUM 0.91 0.90 0.91 366
LOC 0.88 0.91 0.89 233
NUM 0.94 0.94 0.94 274
accuracy 0.89 1633
macro avg 0.75 0.76 0.75 1633
weighted avg 0.88 0.89 0.89 1633
Data Source
This model is trained on the 50 class version of the TREC dataset. http://search.r-project.org/library/textdata/html/dataset_trec.html