Description
This model is trained on Few-NERD/inter public dataset and it extracts 66 entities that are in general scope.
Predicted Entities
building-theater
, art-other
, location-bodiesofwater
, other-god
, organization-politicalparty
, product-other
, building-sportsfacility
, building-restaurant
, organization-sportsleague
, event-election
, organization-media/newspaper
, product-software
, other-educationaldegree
, person-politician
, person-soldier
, other-disease
, product-airplane
, person-athlete
, location-mountain
, organization-company
, other-biologything
, location-other
, other-livingthing
, person-actor
, organization-other
, event-protest
, art-film
, other-award
, other-astronomything
, building-airport
, product-food
, person-other
, event-disaster
, product-weapon
, event-sportsevent
, location-park
, product-ship
, building-library
, art-painting
, building-other
, other-currency
, organization-education
, person-scholar
, organization-showorganization
, person-artist/author
, product-train
, location-GPE
, product-car
, art-writtenart
, event-attack/battle/war/militaryconflict
, other-law
, other-medical
, organization-sportsteam
, art-broadcastprogram
, art-music
, organization-government/governmentagency
, other-language
, event-other
, person-director
, other-chemicalthing
, product-game
, organization-religion
, location-road/railway/highway/transit
, location-island
, building-hotel
, building-hospital
Live Demo Open in Colab Download Copy S3 URI
How to use
...
embeddings = WordEmbeddingsModel.pretrained("glove_100d", "en")\
.setInputCols("sentence", "token") \
.setOutputCol("embeddings")
ner = NerDLModel.pretrained("nerdl_fewnerd_subentity_100d") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = NerConverter()\
.setInputCols(['document', 'token', 'ner']) \
.setOutputCol('ner_chunk')
nlp_pipeline = Pipeline(stages=[document_assembler, sentencer, tokenizer, embeddings, ner, ner_converter])
l_model = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))
annotations = l_model.fullAnnotate("""12 Corazones ('12 Hearts') is Spanish-language dating game show produced in the United States for the television network Telemundo since January 2005, based on its namesake Argentine TV show format. The show is filmed in Los Angeles and revolves around the twelve Zodiac signs that identify each contestant. In 2008, Ho filmed a cameo in the Steven Spielberg feature film The Cloverfield Paradox, as a news pundit.""")
...
val embeddings = WordEmbeddingsModel.pretrained("glove_100d", "en")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val ner = NerDLModel.pretrained("nerdl_fewnerd_subentity_100d")
.setInputCols(Array("sentence", "token", "embeddings")).setOutputCol("ner")
val ner_converter = NerConverter.setInputCols(Array("document", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, embeddings, ner, ner_converter))
val data = Seq("12 Corazones ('12 Hearts') is Spanish-language dating game show produced in the United States for the television network Telemundo since January 2005, based on its namesake Argentine TV show format. The show is filmed in Los Angeles and revolves around the twelve Zodiac signs that identify each contestant. In 2008, Ho filmed a cameo in the Steven Spielberg feature film The Cloverfield Paradox, as a news pundit.").toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.ner.fewnerd_subentity").predict("""12 Corazones ('12 Hearts') is Spanish-language dating game show produced in the United States for the television network Telemundo since January 2005, based on its namesake Argentine TV show format. The show is filmed in Los Angeles and revolves around the twelve Zodiac signs that identify each contestant. In 2008, Ho filmed a cameo in the Steven Spielberg feature film The Cloverfield Paradox, as a news pundit.""")
Results
+-----------------------+----------------------------+
|chunk |ner_label |
+-----------------------+----------------------------+
|Corazones ('12 Hearts')|art-broadcastprogram |
|Spanish-language |other-language |
|United States |location-GPE |
|Telemundo |organization-media/newspaper|
|Argentine TV |organization-media/newspaper|
|Los Angeles |location-GPE |
|Steven Spielberg |person-director |
|Cloverfield Paradox |art-film |
+-----------------------+----------------------------+
Model Information
Model Name: | nerdl_fewnerd_subentity_100d |
Type: | ner |
Compatibility: | Spark NLP 3.1.1+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Data Source
Few-NERD:A Few-shot Named Entity Recognition Dataset, author: Ding, Ning and Xu, Guangwei and Chen, Yulin, and Wang, Xiaobin and Han, Xu and Xie, Pengjun and Zheng, Hai-Tao and Liu, Zhiyuan, book title: ACL-IJCNL, 2021.
Benchmarking
+--------------------+-------+------+-------+-------+---------+------+------+
| entity| tp| fp| fn| total|precision|recall| f1|
+--------------------+-------+------+-------+-------+---------+------+------+
| disaster| 309.0| 114.0| 287.0| 596.0| 0.7305|0.5185|0.6065|
| film| 1589.0| 725.0| 810.0| 2399.0| 0.6867|0.6624|0.6743|
| mountain| 851.0| 175.0| 431.0| 1282.0| 0.8294|0.6638|0.7374|
| currency| 280.0| 66.0| 189.0| 469.0| 0.8092| 0.597|0.6871|
| scholar| 31.0| 12.0| 413.0| 444.0| 0.7209|0.0698|0.1273|
| island| 829.0| 165.0| 372.0| 1201.0| 0.834|0.6903|0.7554|
| politicalparty| 242.0| 86.0| 283.0| 525.0| 0.7378| 0.461|0.5674|
| ship| 461.0| 207.0| 311.0| 772.0| 0.6901|0.5972|0.6403|
| award| 2234.0| 279.0| 1245.0| 3479.0| 0.889|0.6421|0.7457|
| showorganization| 120.0| 201.0| 273.0| 393.0| 0.3738|0.3053|0.3361|
| religion| 218.0| 117.0| 415.0| 633.0| 0.6507|0.3444|0.4504|
| education| 5788.0| 852.0| 1001.0| 6789.0| 0.8717|0.8526| 0.862|
| park| 259.0| 295.0| 176.0| 435.0| 0.4675|0.5954|0.5238|
| painting| 0.0| 0.0| 14.0| 14.0| 0.0| 0.0| 0.0|
| hotel| 570.0| 150.0| 254.0| 824.0| 0.7917|0.6917|0.7383|
| library| 218.0| 92.0| 134.0| 352.0| 0.7032|0.6193|0.6586|
| livingthing| 576.0| 280.0| 312.0| 888.0| 0.6729|0.6486|0.6606|
| educationaldegree| 189.0| 31.0| 47.0| 236.0| 0.8591|0.8008|0.8289|
| director| 673.0| 227.0| 507.0| 1180.0| 0.7478|0.5703|0.6471|
| food| 474.0| 375.0| 341.0| 815.0| 0.5583|0.5816|0.5697|
| athlete| 1181.0| 529.0| 540.0| 1721.0| 0.6906|0.6862|0.6884|
| software| 922.0| 460.0| 493.0| 1415.0| 0.6671|0.6516|0.6593|
| protest| 162.0| 212.0| 275.0| 437.0| 0.4332|0.3707|0.3995|
| other|12555.0|7510.0|14369.0|26924.0| 0.6257|0.4663|0.5344|
| sportsleague| 1439.0| 654.0| 842.0| 2281.0| 0.6875|0.6309| 0.658|
| airplane| 1295.0| 442.0| 463.0| 1758.0| 0.7455|0.7366|0.7411|
| train| 135.0| 111.0| 198.0| 333.0| 0.5488|0.4054|0.4663|
| biologything| 1574.0| 625.0| 924.0| 2498.0| 0.7158|0.6301|0.6702|
| politician| 3107.0|1545.0| 1688.0| 4795.0| 0.6679| 0.648|0.6578|
| music| 419.0| 211.0| 182.0| 601.0| 0.6651|0.6972|0.6807|
|government/govern...| 564.0| 656.0| 511.0| 1075.0| 0.4623|0.5247|0.4915|
| media/newspaper| 1600.0|1072.0| 893.0| 2493.0| 0.5988|0.6418|0.6196|
| actor| 674.0| 161.0| 274.0| 948.0| 0.8072| 0.711| 0.756|
| language| 698.0| 226.0| 335.0| 1033.0| 0.7554|0.6757|0.7133|
| chemicalthing| 592.0| 231.0| 687.0| 1279.0| 0.7193|0.4629|0.5633|
| sportsfacility| 870.0| 334.0| 291.0| 1161.0| 0.7226|0.7494|0.7357|
| hospital| 226.0| 472.0| 49.0| 275.0| 0.3238|0.8218|0.4645|
| writtenart| 297.0| 203.0| 450.0| 747.0| 0.594|0.3976|0.4763|
|road/railway/high...| 3238.0| 926.0| 1063.0| 4301.0| 0.7776|0.7528| 0.765|
| election| 13.0| 13.0| 127.0| 140.0| 0.5|0.0929|0.1566|
| soldier| 623.0| 537.0| 559.0| 1182.0| 0.5371|0.5271| 0.532|
| god| 332.0| 157.0| 414.0| 746.0| 0.6789| 0.445|0.5377|
| astronomything| 1120.0| 353.0| 232.0| 1352.0| 0.7604|0.8284|0.7929|
|attack/battle/war...| 2516.0| 444.0| 590.0| 3106.0| 0.85| 0.81|0.8295|
| broadcastprogram| 1056.0| 762.0| 811.0| 1867.0| 0.5809|0.5656|0.5731|
| airport| 857.0| 96.0| 112.0| 969.0| 0.8993|0.8844|0.8918|
| theater| 72.0| 31.0| 119.0| 191.0| 0.699| 0.377|0.4898|
| weapon| 303.0| 190.0| 237.0| 540.0| 0.6146|0.5611|0.5866|
| company| 5849.0|2632.0| 2570.0| 8419.0| 0.6897|0.6947|0.6922|
| car| 413.0| 293.0| 207.0| 620.0| 0.585|0.6661|0.6229|
| artist/author| 4172.0|1953.0| 1777.0| 5949.0| 0.6811|0.7013|0.6911|
| medical| 94.0| 112.0| 192.0| 286.0| 0.4563|0.3287|0.3821|
| disease| 1009.0| 476.0| 447.0| 1456.0| 0.6795| 0.693|0.6862|
| game| 141.0| 120.0| 264.0| 405.0| 0.5402|0.3481|0.4234|
| sportsevent| 1042.0| 553.0| 552.0| 1594.0| 0.6533|0.6537|0.6535|
| sportsteam| 3657.0|1133.0| 1301.0| 4958.0| 0.7635|0.7376|0.7503|
| restaurant| 285.0| 444.0| 201.0| 486.0| 0.3909|0.5864|0.4691|
| bodiesofwater| 314.0| 91.0| 343.0| 657.0| 0.7753|0.4779|0.5913|
| law| 1626.0| 583.0| 329.0| 1955.0| 0.7361|0.8317| 0.781|
| GPE|22173.0|5585.0| 3839.0|26012.0| 0.7988|0.8524|0.8247|
+--------------------+-------+------+-------+-------+---------+------+------+
+-----------------+
| macro|
+-----------------+
|0.608599546406531|
+-----------------+
+-----------------+
| micro|
+-----------------+
|0.684720504256685|
+-----------------+