Description
Classify open-domain, fact-based questions into one of the following broad semantic categories: Abbreviation, Description, Entities, Human Beings, Locations or Numeric Values.
Predicted Entities
ABBR
, DESC
, NUM
, ENTY
, LOC
, HUM
.
Live Demo
Open in Colab
Download
Copy S3 URI
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
use = UniversalSentenceEncoder.pretrained(lang="en") \
.setInputCols(["document"])\
.setOutputCol("sentence_embeddings")
document_classifier = ClassifierDLModel.pretrained('classifierdl_use_trec6', 'en') \
.setInputCols(["document", "sentence_embeddings"]) \
.setOutputCol("class")
nlpPipeline = Pipeline(stages=[documentAssembler, use, document_classifier])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))
annotations = light_pipeline.fullAnnotate('When did the construction of stone circles begin in the UK?')
val documentAssembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val use = UniversalSentenceEncoder.pretrained(lang="en")
.setInputCols(Array("document"))
.setOutputCol("sentence_embeddings")
val document_classifier = ClassifierDLModel.pretrained("classifierdl_use_trec6", "en")
.setInputCols(Array("document", "sentence_embeddings"))
.setOutputCol("class")
val pipeline = new Pipeline().setStages(Array(documentAssembler, use, document_classifier))
val data = Seq("When did the construction of stone circles begin in the UK?").toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
text = ["""When did the construction of stone circles begin in the UK?"""]
trec6_df = nlu.load('en.classify.trec6.use').predict(text, output_level='document')
trec6_df[["document", "trec6"]]
Results
+------------------------------------------------------------------------------------------------+------------+
|document |class |
+------------------------------------------------------------------------------------------------+------------+
|When did the construction of stone circles begin in the UK? | NUM |
+------------------------------------------------------------------------------------------------+------------+
Model Information
|————————-|————————————–| | Model Name | classifierdl_use_trec6 | | Model Class | ClassifierDLModel | | Spark Compatibility | 2.5.0 | | Spark NLP Compatibility | 2.4 | | License | open source | | Edition | public | | Input Labels | [document, sentence_embeddings] | | Output Labels | [class] | | Language | en | | Upstream Dependencies | tfhub_use |
Data Source
This model is trained on the 6 class version of TREC dataset. http://search.r-project.org/library/textdata/html/dataset_trec.html
Benchmarking
precision recall f1-score support
ABBR 0.00 0.00 0.00 26
DESC 0.89 0.96 0.92 343
ENTY 0.86 0.86 0.86 391
HUM 0.91 0.90 0.91 366
LOC 0.88 0.91 0.89 233
NUM 0.94 0.94 0.94 274
accuracy 0.89 1633
macro avg 0.75 0.76 0.75 1633
weighted avg 0.88 0.89 0.89 1633