Description
Classify open-domain, fact-based questions into sub categories of the following broad semantic categories: Abbreviation, Description, Entities, Human Beings, Locations or Numeric Values.
Predicted Entities
ENTY_animal
, ENTY_body
, ENTY_color
, ENTY_cremat
, ENTY_currency
, ENTY_dismed
, ENTY_event
, ENTY_food
, ENTY_instru
, ENTY_lang
, ENTY_letter
, ENTY_other
, ENTY_plant
, ENTY_product
, ENTY_religion
, ENTY_sport
, ENTY_substance
, ENTY_symbol
, ENTY_techmeth
, ENTY_termeq
, ENTY_veh
, ENTY_word
, DESC_def
, DESC_desc
, DESC_manner
, DESC_reason
, HUM_gr
, HUM_ind
, HUM_title
, HUM_desc
, LOC_city
, LOC_country
, LOC_mount
, LOC_other
, LOC_state
, NUM_code
, NUM_count
, NUM_date
, NUM_dist
, NUM_money
, NUM_ord
, NUM_other
, NUM_period
, NUM_perc
, NUM_speed
, NUM_temp
, NUM_volsize
, NUM_weight
, ABBR_abb
, ABBR_exp
.
Live Demo Open in Colab Download Copy S3 URI
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
use = UniversalSentenceEncoder.pretrained(lang="en") \
.setInputCols(["document"])\
.setOutputCol("sentence_embeddings")
document_classifier = ClassifierDLModel.pretrained('classifierdl_use_trec50', 'en') \
.setInputCols(["document", "sentence_embeddings"]) \
.setOutputCol("class")
nlpPipeline = Pipeline(stages=[documentAssembler, use, document_classifier])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))
annotations = light_pipeline.fullAnnotate('When did the construction of stone circles begin in the UK?')
val documentAssembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val use = UniversalSentenceEncoder.pretrained(lang="en")
.setInputCols(Array("document"))
.setOutputCol("sentence_embeddings")
val document_classifier = ClassifierDLModel.pretrained("classifierdl_use_trec50", "en")
.setInputCols(Array("document", "sentence_embeddings"))
.setOutputCol("class")
val pipeline = new Pipeline().setStages(Array(documentAssembler, use, document_classifier))
val data = Seq("When did the construction of stone circles begin in the UK?").toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
text = ["""When did the construction of stone circles begin in the UK?"""]
trec50_df = nlu.load('en.classify.trec50.use').predict(text, output_level = "document")
trec50_df[["document", "trec50"]]
Results
+------------------------------------------------------------------------------------------------+------------+
|document |class |
+------------------------------------------------------------------------------------------------+------------+
|When did the construction of stone circles begin in the UK? | NUM_date |
+------------------------------------------------------------------------------------------------+------------+
Model Information
Model Name: | classifierdl_use_trec50 |
Compatibility: | Spark NLP 2.7.1+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [class] |
Language: | en |
Data Source
This model is trained on the 50 class version of the TREC dataset. http://search.r-project.org/library/textdata/html/dataset_trec.html