Description
This model is trained on Few-NERD/inter public dataset and it extracts 8 entities that are in general scope.
Predicted Entities
PERSON
, ORGANIZATION
, LOCATION
, ART
, BUILDING
, PRODUCT
, EVENT
, OTHER
Live Demo Open in Colab Download Copy S3 URI
How to use
The model is trained using glove_100d
word embeddings so, you should use the same embeddings in your nlp pipeline.
...
embeddings = WordEmbeddingsModel.pretrained("glove_100d", "en")\
.setInputCols("sentence", "token") \
.setOutputCol("embeddings")
ner = NerDLModel.pretrained("nerdl_fewnerd_100d") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = NerConverter()\
.setInputCols(['document', 'token', 'ner']) \
.setOutputCol('ner_chunk')
nlp_pipeline = Pipeline(stages=[document_assembler, sentencer, tokenizer, embeddings, ner, ner_converter])
l_model = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))
annotations = l_model.fullAnnotate("""The Double Down is a sandwich offered by Kentucky Fried Chicken (KFC) restaurants. He did not see active service again until 1882, when he took part in the Anglo-Egyptian War, and was present at the battle of Tell El Kebir (September 1882), for which he was mentioned in dispatches, received the Egypt Medal with clasp and the 3rd class of the Order of Medjidie, and was appointed a Companion of the Order of the Bath (CB).""")
...
val embeddings = WordEmbeddingsModel.pretrained("glove_100d", "en")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val ner = NerDLModel.pretrained("nerdl_fewnerd_100d")
.setInputCols(Array("sentence", "token", "embeddings")).setOutputCol("ner")
val ner_converter = NerConverter.setInputCols(Array("document", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, embeddings, ner, ner_converter))
val data = Seq("The Double Down is a sandwich offered by Kentucky Fried Chicken (KFC) restaurants. He did not see active service again until 1882, when he took part in the Anglo-Egyptian War, and was present at the battle of Tell El Kebir (September 1882), for which he was mentioned in dispatches, received the Egypt Medal with clasp and the 3rd class of the Order of Medjidie, and was appointed a Companion of the Order of the Bath (CB).").toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.ner.fewnerd").predict("""The Double Down is a sandwich offered by Kentucky Fried Chicken (KFC) restaurants. He did not see active service again until 1882, when he took part in the Anglo-Egyptian War, and was present at the battle of Tell El Kebir (September 1882), for which he was mentioned in dispatches, received the Egypt Medal with clasp and the 3rd class of the Order of Medjidie, and was appointed a Companion of the Order of the Bath (CB).""")
Results
+----------------------------------+---------+
|chunk |ner_label|
+----------------------------------+---------+
|Double Down |PRODUCT |
|Kentucky Fried Chicken |BUILDING |
|KFC |BUILDING |
|Anglo-Egyptian War |EVENT |
|Tell El Kebir |EVENT |
|Egypt Medal |OTHER |
|Order of Medjidie |OTHER |
|Companion of the Order of the Bath|OTHER |
+----------------------------------+---------+
Model Information
Model Name: | nerdl_fewnerd_100d |
Type: | ner |
Compatibility: | Spark NLP 3.1.1+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Data Source
Few-NERD:A Few-shot Named Entity Recognition Dataset, author: Ding, Ning and Xu, Guangwei and Chen, Yulin, and Wang, Xiaobin and Han, Xu and Xie, Pengjun and Zheng, Hai-Tao and Liu, Zhiyuan, book title: ACL-IJCNL, 2021.
Benchmarking
+------------+-------+------+-------+-------+---------+------+------+
| entity| tp| fp| fn| total|precision|recall| f1|
+------------+-------+------+-------+-------+---------+------+------+
| PERSON|21555.0|6194.0| 5643.0|27198.0| 0.7768|0.7925|0.7846|
|ORGANIZATION|36744.0|9059.0|13156.0|49900.0| 0.8022|0.7364|0.7679|
| LOCATION|36367.0|7521.0| 7006.0|43373.0| 0.8286|0.8385|0.8335|
| ART| 6170.0|1649.0| 2998.0| 9168.0| 0.7891| 0.673|0.7264|
| BUILDING| 5112.0|2435.0| 3014.0| 8126.0| 0.6774|0.6291|0.6523|
| PRODUCT| 8317.0|3253.0| 4325.0|12642.0| 0.7188|0.6579| 0.687|
| OTHER|14461.0|4414.0| 5161.0|19622.0| 0.7661| 0.737|0.7513|
| EVENT| 6024.0|1880.0| 2275.0| 8299.0| 0.7621|0.7259|0.7436|
+------------+-------+------+-------+-------+---------+------+------+
+------------------+
| macro|
+------------------+
|0.7433252741184967|
+------------------+
+------------------+
| micro|
+------------------+
|0.7703038245945377|
+------------------+