Description
Extract important aspects from user questions like ‘arrive_date’ - arrival time of a flight, ‘fare_amount’ - fare information, and 77 other different types of aspects and entities to automate the process of question answering.
Predicted Entities
aircraft_code
, airline_code
, airline_name
, airport_code
, airport_name
, arrive_date.date_relative
, arrive_date.day_name
, arrive_date.day_number
,
arrive_date.month_name
, arrive_date.today_relative
, arrive_time.end_time
, arrive_time.period_mod
, arrive_time.period_of_day
, arrive_time.start_time
, arrive_time.time
,
arrive_time.time_relative
, city_name
, class_type
, connect
, cost_relative
, day_name
, day_number
, days_code
, depart_date.date_relative
, depart_date.day_name
,
depart_date.day_number
, depart_date.month_name
, depart_date.today_relative
, depart_date.year
, depart_time.end_time
, depart_time.period_mod
,
depart_time.period_of_day
, depart_time.start_time
, depart_time.time
, depart_time.time_relative
, economy
, fare_amount
, fare_basis_code
, flight_days
, flight_mod
,
flight_number
, flight_stop
, flight_time
, fromloc.airport_code
, fromloc.airport_name
, fromloc.city_name
, fromloc.state_code
, fromloc.state_name
, meal
,
meal_code
, meal_description
, mod
, month_name
, or
, period_of_day
, restriction_code
, return_date.date_relative
, return_date.day_name
, return_date.day_number
,
return_date.month_name
, return_date.today_relative
, return_time.period_mod
, return_time.period_of_day
, round_trip
, state_code
, state_name
, stoploc.airport_name
,
stoploc.city_name
, stoploc.state_code
, time
, time_relative
, today_relative
, toloc.airport_code
, toloc.airport_name
, toloc.city_name
, toloc.country_name
,
toloc.state_code
, toloc.state_name
, transport_type
.
Open in Colab Download Copy S3 URI
How to use
...
embeddings = WordEmbeddingsModel.pretrained("glove_840B_300", "xx")\
.setInputCols("document", "token") \
.setOutputCol("embeddings")
ner = NerDLModel.pretrained("nerdl_atis_840b_300d", "en") \
.setInputCols(["document", "token", "embeddings"]) \
.setOutputCol("ner")
...
pipeline = Pipeline(stages=[document_assembler, tokenizer, embeddings, ner, ner_converter])
example = spark.createDataFrame([['i want to fly from baltimore to dallas round trip']], ["text"])
result = pipeline.fit(example).transform(example)
...
val embeddings = WordEmbeddingsModel.pretrained("glove_840B_300", "xx")
.setInputCols(Array("document", "token"))
.setOutputCol("embeddings")
val ner = NerDLModel.pretrained("nerdl_atis_840b_300d", "en")
.setInputCols(Array("document", "token", "embeddings"))
.setOutputCol("ner")
...
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings, ner, ner_converter))
val data = Seq("i want to fly from baltimore to dallas round trip").toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.ner.airline").predict("""i want to fly from baltimore to dallas round trip""")
Results
+-------------------+---------------------+
| ner_chunk | label |
+-------------------+---------------------+
| baltimore | fromloc.city_name |
| dallas | toloc.city_name |
| round trip | round_trip |
+-------------------+---------------------+
Model Information
Model Name: | nerdl_atis_840b_300d |
Type: | ner |
Compatibility: | Spark NLP 2.7.1+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [document, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Dependencies: | glove_840B_300 |
Data Source
This model is trained on the ATIS dataset https://www.kaggle.com/hassanamin/atis-airlinetravelinformationsystem
Benchmarking
| | label | tp | fp | fn | prec | rec | f1 |
|----:|-----------------------------:|------:|-----:|-----:|---------:|---------:|---------:|
| 0 | B-airline_name | 101 | 5 | 0 | 0.95283 | 1 | 0.975845 |
| 1 | B-booking_class | 0 | 0 | 1 | 0 | 0 | 0 |
| 2 | B-toloc.state_code | 18 | 0 | 0 | 1 | 1 | 1 |
| 3 | B-arrive_time.period_of_day | 6 | 2 | 0 | 0.75 | 1 | 0.857143 |
| 4 | B-flight_number | 11 | 5 | 0 | 0.6875 | 1 | 0.814815 |
| 5 | B-depart_date.year | 3 | 0 | 0 | 1 | 1 | 1 |
| 6 | I-depart_time.time_relative | 0 | 0 | 1 | 0 | 0 | 0 |
| 7 | I-depart_time.time | 52 | 1 | 0 | 0.981132 | 1 | 0.990476 |
| 8 | B-transport_type | 10 | 0 | 0 | 1 | 1 | 1 |
| 9 | I-depart_date.day_number | 14 | 1 | 1 | 0.933333 | 0.933333 | 0.933333 |
| 10 | I-city_name | 12 | 1 | 18 | 0.923077 | 0.4 | 0.55814 |
| 11 | B-depart_time.end_time | 2 | 0 | 1 | 1 | 0.666667 | 0.8 |
| 12 | B-depart_date.date_relative | 17 | 2 | 0 | 0.894737 | 1 | 0.944445 |
| 13 | B-toloc.country_name | 1 | 0 | 0 | 1 | 1 | 1 |
| 14 | I-depart_time.start_time | 1 | 0 | 0 | 1 | 1 | 1 |
| 15 | B-flight_stop | 21 | 0 | 0 | 1 | 1 | 1 |
| 16 | B-airline_code | 27 | 1 | 7 | 0.964286 | 0.794118 | 0.870968 |
| 17 | B-stoploc.city_name | 20 | 1 | 0 | 0.952381 | 1 | 0.97561 |
| 18 | I-fromloc.airport_name | 15 | 19 | 0 | 0.441176 | 1 | 0.612245 |
| 19 | B-round_trip | 70 | 0 | 3 | 1 | 0.958904 | 0.979021 |
| 20 | B-day_name | 1 | 0 | 1 | 1 | 0.5 | 0.666667 |
| 21 | B-depart_time.time_relative | 63 | 2 | 2 | 0.969231 | 0.969231 | 0.969231 |
| 22 | B-fromloc.airport_name | 10 | 12 | 2 | 0.454545 | 0.833333 | 0.588235 |
| 23 | B-meal_code | 0 | 1 | 1 | 0 | 0 | 0 |
| 24 | B-fromloc.airport_code | 5 | 6 | 0 | 0.454545 | 1 | 0.625 |
| 25 | B-arrive_time.start_time | 8 | 1 | 0 | 0.888889 | 1 | 0.941177 |
| 26 | I-state_name | 0 | 0 | 1 | 0 | 0 | 0 |
| 27 | I-arrive_time.time | 34 | 0 | 1 | 1 | 0.971429 | 0.985507 |
| 28 | B-airport_code | 3 | 1 | 6 | 0.75 | 0.333333 | 0.461538 |
| 29 | B-depart_time.start_time | 2 | 0 | 1 | 1 | 0.666667 | 0.8 |
| 30 | B-arrive_time.time_relative | 27 | 3 | 4 | 0.9 | 0.870968 | 0.885246 |
| 31 | I-cost_relative | 2 | 0 | 1 | 1 | 0.666667 | 0.8 |
| 32 | B-flight | 0 | 0 | 1 | 0 | 0 | 0 |
| 33 | B-toloc.state_name | 25 | 3 | 3 | 0.892857 | 0.892857 | 0.892857 |
| 34 | I-arrive_time.start_time | 1 | 0 | 0 | 1 | 1 | 1 |
| 35 | B-compartment | 0 | 0 | 1 | 0 | 0 | 0 |
| 36 | B-toloc.airport_code | 4 | 0 | 0 | 1 | 1 | 1 |
| 37 | B-connect | 6 | 0 | 0 | 1 | 1 | 1 |
| 38 | I-return_date.date_relative | 2 | 0 | 1 | 1 | 0.666667 | 0.8 |
| 39 | I-depart_time.end_time | 2 | 0 | 1 | 1 | 0.666667 | 0.8 |
| 40 | B-meal | 16 | 1 | 0 | 0.941176 | 1 | 0.969697 |
| 41 | B-arrive_date.month_name | 5 | 2 | 1 | 0.714286 | 0.833333 | 0.769231 |
| 42 | B-arrive_date.day_number | 5 | 2 | 1 | 0.714286 | 0.833333 | 0.769231 |
| 43 | I-toloc.state_name | 1 | 0 | 0 | 1 | 1 | 1 |
| 44 | B-arrive_date.date_relative | 2 | 0 | 0 | 1 | 1 | 1 |
| 45 | I-fromloc.state_name | 1 | 0 | 0 | 1 | 1 | 1 |
| 46 | B-depart_time.period_of_day | 121 | 1 | 9 | 0.991803 | 0.930769 | 0.960317 |
| 47 | I-toloc.airport_name | 3 | 2 | 0 | 0.6 | 1 | 0.75 |
| 48 | B-class_type | 24 | 1 | 0 | 0.96 | 1 | 0.979592 |
| 49 | B-period_of_day | 3 | 0 | 1 | 1 | 0.75 | 0.857143 |
| 50 | I-flight_number | 0 | 0 | 1 | 0 | 0 | 0 |
| 51 | B-stoploc.airport_code | 0 | 0 | 1 | 0 | 0 | 0 |
| 52 | B-flight_mod | 23 | 0 | 1 | 1 | 0.958333 | 0.978723 |
| 53 | I-airport_name | 13 | 2 | 16 | 0.866667 | 0.448276 | 0.590909 |
| 54 | B-arrive_time.time | 33 | 2 | 1 | 0.942857 | 0.970588 | 0.956522 |
| 55 | B-arrive_date.day_name | 11 | 3 | 0 | 0.785714 | 1 | 0.88 |
| 56 | I-restriction_code | 3 | 0 | 0 | 1 | 1 | 1 |
| 57 | I-flight_time | 1 | 0 | 0 | 1 | 1 | 1 |
| 58 | B-meal_description | 10 | 0 | 0 | 1 | 1 | 1 |
| 59 | I-flight_mod | 1 | 0 | 5 | 1 | 0.166667 | 0.285714 |
| 60 | B-flight_days | 10 | 0 | 0 | 1 | 1 | 1 |
| 61 | I-stoploc.city_name | 10 | 2 | 0 | 0.833333 | 1 | 0.909091 |
| 62 | B-economy | 6 | 0 | 0 | 1 | 1 | 1 |
| 63 | I-arrive_date.day_number | 0 | 1 | 0 | 0 | 0 | 0 |
| 64 | B-toloc.airport_name | 3 | 1 | 0 | 0.75 | 1 | 0.857143 |
| 65 | I-class_type | 17 | 0 | 0 | 1 | 1 | 1 |
| 66 | B-state_code | 1 | 0 | 0 | 1 | 1 | 1 |
| 67 | B-aircraft_code | 27 | 0 | 6 | 1 | 0.818182 | 0.9 |
| 68 | B-days_code | 0 | 0 | 1 | 0 | 0 | 0 |
| 69 | B-or | 3 | 3 | 0 | 0.5 | 1 | 0.666667 |
| 70 | B-return_date.date_relative | 1 | 1 | 2 | 0.5 | 0.333333 | 0.4 |
| 71 | B-fromloc.state_name | 17 | 0 | 0 | 1 | 1 | 1 |
| 72 | B-depart_date.month_name | 54 | 1 | 2 | 0.981818 | 0.964286 | 0.972973 |
| 73 | B-fromloc.city_name | 703 | 6 | 1 | 0.991537 | 0.99858 | 0.995046 |
| 74 | B-restriction_code | 4 | 2 | 0 | 0.666667 | 1 | 0.8 |
| 75 | B-state_name | 0 | 0 | 9 | 0 | 0 | 0 |
| 76 | B-city_name | 32 | 4 | 25 | 0.888889 | 0.561404 | 0.688172 |
| 77 | I-round_trip | 70 | 0 | 1 | 1 | 0.985915 | 0.992908 |
| 78 | I-transport_type | 0 | 0 | 1 | 0 | 0 | 0 |
| 79 | B-depart_date.day_name | 210 | 2 | 2 | 0.990566 | 0.990566 | 0.990566 |
| 80 | B-fare_basis_code | 17 | 4 | 0 | 0.809524 | 1 | 0.894737 |
| 81 | I-arrive_time.end_time | 8 | 1 | 0 | 0.888889 | 1 | 0.941177 |
| 82 | B-depart_date.day_number | 53 | 1 | 2 | 0.981482 | 0.963636 | 0.972477 |
| 83 | B-return_date.day_name | 0 | 0 | 2 | 0 | 0 | 0 |
| 84 | B-toloc.city_name | 713 | 21 | 3 | 0.97139 | 0.99581 | 0.983448 |
| 85 | I-fare_amount | 2 | 0 | 0 | 1 | 1 | 1 |
| 86 | B-fromloc.state_code | 23 | 0 | 0 | 1 | 1 | 1 |
| 87 | B-cost_relative | 36 | 0 | 1 | 1 | 0.972973 | 0.986301 |
| 88 | I-arrive_time.time_relative | 0 | 0 | 4 | 0 | 0 | 0 |
| 89 | B-mod | 0 | 0 | 2 | 0 | 0 | 0 |
| 90 | B-flight_time | 1 | 0 | 0 | 1 | 1 | 1 |
| 91 | B-arrive_time.end_time | 8 | 1 | 0 | 0.888889 | 1 | 0.941177 |
| 92 | B-fare_amount | 2 | 0 | 0 | 1 | 1 | 1 |
| 93 | I-toloc.city_name | 263 | 12 | 2 | 0.956364 | 0.992453 | 0.974074 |
| 94 | I-fromloc.city_name | 176 | 1 | 1 | 0.99435 | 0.99435 | 0.99435 |
| 95 | B-airport_name | 10 | 2 | 11 | 0.833333 | 0.47619 | 0.606061 |
| 96 | I-airline_name | 65 | 0 | 0 | 1 | 1 | 1 |
| 97 | I-depart_time.period_of_day | 0 | 0 | 1 | 0 | 0 | 0 |
| 98 | B-depart_date.today_relative | 8 | 0 | 1 | 1 | 0.888889 | 0.941177 |
| 99 | B-depart_time.time | 57 | 8 | 0 | 0.876923 | 1 | 0.934426 |
| 100 | B-depart_time.period_mod | 5 | 0 | 0 | 1 | 1 | 1 |
| 101 | Macro-average | 3487 | 157 | 176 | 0.768428 | 0.758601 | 0.763483 |
| 102 | Micro-average | 3487 | 157 | 176 | 0.956916 | 0.951952 | 0.954427 |