Description
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization. This NER model was trained over the XTREME dataset by using XlmRoBertaEmbeddings (xlm_roberta_xtreme_base).
This NER model covers a subset of the 40 languages included in XTREME (shown here with their ISO 639-1 code):
af
, ar
, bg
, bn
, de
, el
, en
, es
, et
, eu
, fa
, fi
, fr
, he
, hi
, hu
, id
, it
, ja
, jv
, ka
, kk
, ko
, ml
, mr
, ms
, my
, nl
, pt
, ru
, sw
, ta
, te
, th
, tl
, tr
, ur
, vi
, yo
, and zh
Predicted Entities
- B-LOC
- I-LOC
- B-ORG
- I-ORG
- B-PER
- I-PER
How to use
document_assembler = DocumentAssembler() \
.setInputCol('text') \
.setOutputCol('document')
tokenizer = Tokenizer() \
.setInputCols(['document']) \
.setOutputCol('token')
embeddings = XlmRoBertaEmbeddings\
.pretrained('xlm_roberta_xtreme_base', 'xx')\
.setInputCols(["token", "document"])\
.setOutputCol("embeddings")
ner_model = NerDLModel.pretrained('ner_xtreme_xlm_roberta_xtreme_base', 'xx') \
.setInputCols(['document', 'token', 'embeddings']) \
.setOutputCol('ner')
ner_converter = NerConverter() \
.setInputCols(['document', 'token', 'ner']) \
.setOutputCol('entities')
pipeline = Pipeline(stages=[
document_assembler,
tokenizer,
embeddings,
ner_model,
ner_converter
])
text_list = [["""Jerome Horsey was a resident of the Russia Company in Moscow from 1572 to 1585."""],
["""Emilie Hartmanns Vater August Hartmann war Lehrer an der Hohen Karlsschule in Stuttgart, bis zu deren Auflösung 1793."""],
["""James Watt nacque in Scozia il 19 gennaio 1736 da genitori presbiteriani."""],
["""Quand j'ai dit à John que je voulais déménager en Alaska, il m'a prévenu que j'aurais du mal à trouver un Starbucks là-bas."""]]
example = spark.createDataFrame(text_list).toDF("text")
result = pipeline.fit(example).transform(example)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val tokenizer = new Tokenizer()
.setInputCols("document")
.setOutputCol("token")
val embeddings = XlmRoBertaEmbeddings.pretrained("xlm_roberta_xtreme_base", "xx")
.setInputCols(Array("document", "token"))
.setOutputCol("embeddings")
val ner_model = NerDLModel.pretrained("ner_xtreme_xlm_roberta_xtreme_base", "xx")
.setInputCols(Array("document", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverter()
.setInputCols(Array("document", "token", "ner"))
.setOutputCol("entities")
val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, embeddings, ner_model, ner_converter))
val data = Seq(("""Jerome Horsey was a resident of the Russia Company in Moscow from 1572 to 1585."""),
("""Emilie Hartmanns Vater August Hartmann war Lehrer an der Hohen Karlsschule in Stuttgart, bis zu deren Auflösung 1793."""),
("""James Watt nacque in Scozia il 19 gennaio 1736 da genitori presbiteriani."""),
("""Quand j'ai dit à John que je voulais déménager en Alaska, il m'a prévenu que j'aurais du mal à trouver un Starbucks là-bas.""")).toDS.toDF("text"))
val result = pipeline.fit(data).transform(data)
import nlu
text = [["""Jerome Horsey was a resident of the Russia Company in Moscow from 1572 to 1585."""],
["""Emilie Hartmanns Vater August Hartmann war Lehrer an der Hohen Karlsschule in Stuttgart, bis zu deren Auflösung 1793."""],
["""James Watt nacque in Scozia il 19 gennaio 1736 da genitori presbiteriani."""],
["""Quand j'ai dit à John que je voulais déménager en Alaska, il m'a prévenu que j'aurais du mal à trouver un Starbucks là-bas."""]]
ner_df = nlu.load('xx.ner.ner_xtreme_xlm_roberta_xtreme_base').predict(text, output_level='token')
Results
+-----------------+---------+
|chunk |ner_label|
+-----------------+---------+
|Jerome Horsey |PER |
|Russia Company |ORG |
|Moscow |LOC |
|Emilie Hartmanns |PER |
|August Hartmann |PER |
|Hohen Karlsschule|ORG |
|Stuttgart |LOC |
|James Watt |PER |
|Scozia |LOC |
|John |PER |
|Alaska |LOC |
|Starbucks |ORG |
+-----------------+---------+
Model Information
Model Name: | ner_xtreme_xlm_roberta_xtreme_base |
Type: | ner |
Compatibility: | Spark NLP 3.1.3+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | xx |
Data Source
https://github.com/google-research/xtreme
Benchmarking
Average of all languages benchmark (multi-label classification and CoNLL Eval):
precision recall f1-score support
B-LOC 0.87 0.89 0.88 129861
I-ORG 0.82 0.84 0.83 291145
I-MISC 0.00 0.00 0.00 0
I-LOC 0.81 0.84 0.83 179310
I-PER 0.87 0.89 0.88 234076
B-MISC 0.00 0.00 0.00 0
B-ORG 0.85 0.80 0.82 105547
B-PER 0.91 0.90 0.91 114118
micro avg 0.85 0.86 0.85 1054057
macro avg 0.64 0.64 0.64 1054057
weighted avg 0.85 0.86 0.85 1054057
processed 2928018 tokens with 349526 phrases; found: 344025 phrases; correct: 292983.
accuracy: 85.78%; (non-O)
accuracy: 92.66%; precision: 85.16%; recall: 83.82%; FB1: 84.49
LOC: precision: 84.65%; recall: 86.65%; FB1: 85.64 132937
ORG: precision: 81.27%; recall: 76.33%; FB1: 78.72 99127
PER: precision: 89.22%; recall: 87.53%; FB1: 88.37 111961
###############################
Language by language benchmarks (multi-label classification and CoNLL Eval):
lang: af
precision recall f1-score support
B-LOC 0.84 0.92 0.88 562
I-ORG 0.92 0.93 0.92 786
I-LOC 0.70 0.80 0.74 198
I-PER 0.94 0.97 0.95 504
B-ORG 0.92 0.84 0.88 569
B-PER 0.92 0.96 0.94 356
micro avg 0.89 0.91 0.90 2975
macro avg 0.87 0.90 0.89 2975
weighted avg 0.89 0.91 0.90 2975
processed 10808 tokens with 1487 phrases; found: 1506 phrases; correct: 1323.
accuracy: 91.29%; (non-O)
accuracy: 96.50%; precision: 87.85%; recall: 88.97%; FB1: 88.41
LOC: precision: 82.52%; recall: 90.75%; FB1: 86.44 618
ORG: precision: 91.52%; recall: 83.48%; FB1: 87.32 519
PER: precision: 91.60%; recall: 94.94%; FB1: 93.24 369
###############################
lang: ar
precision recall f1-score support
B-LOC 0.87 0.90 0.88 3780
I-ORG 0.89 0.89 0.89 10045
I-LOC 0.90 0.93 0.92 9073
I-PER 0.90 0.89 0.89 7937
B-ORG 0.89 0.82 0.85 3629
B-PER 0.88 0.88 0.88 3850
micro avg 0.89 0.89 0.89 38314
macro avg 0.89 0.88 0.89 38314
weighted avg 0.89 0.89 0.89 38314
processed 64347 tokens with 11259 phrases; found: 11109 phrases; correct: 9447.
accuracy: 89.22%; (non-O)
accuracy: 92.67%; precision: 85.04%; recall: 83.91%; FB1: 84.47
LOC: precision: 85.04%; recall: 88.15%; FB1: 86.57 3918
ORG: precision: 85.00%; recall: 78.86%; FB1: 81.82 3367
PER: precision: 85.07%; recall: 84.49%; FB1: 84.78 3824
###############################
lang: bg
precision recall f1-score support
B-LOC 0.92 0.95 0.94 6436
I-ORG 0.91 0.89 0.90 7964
I-LOC 0.85 0.89 0.87 3213
I-PER 0.91 0.94 0.93 4982
B-ORG 0.88 0.82 0.85 3670
B-PER 0.92 0.94 0.93 3954
micro avg 0.91 0.91 0.91 30219
macro avg 0.90 0.91 0.90 30219
weighted avg 0.91 0.91 0.91 30219
processed 83463 tokens with 14060 phrases; found: 14076 phrases; correct: 12687.
accuracy: 91.03%; (non-O)
accuracy: 95.94%; precision: 90.13%; recall: 90.23%; FB1: 90.18
LOC: precision: 91.61%; recall: 94.33%; FB1: 92.95 6627
ORG: precision: 85.89%; recall: 79.81%; FB1: 82.74 3410
PER: precision: 91.28%; recall: 93.25%; FB1: 92.26 4039
###############################
lang: bn
precision recall f1-score support
B-LOC 0.85 0.93 0.89 393
I-ORG 0.93 0.91 0.92 1031
I-LOC 0.86 0.91 0.89 703
I-PER 0.95 0.93 0.94 731
B-ORG 0.92 0.90 0.91 349
B-PER 0.95 0.92 0.94 347
micro avg 0.91 0.92 0.91 3554
macro avg 0.91 0.92 0.91 3554
weighted avg 0.92 0.92 0.91 3554
processed 4377 tokens with 1089 phrases; found: 1100 phrases; correct: 979.
accuracy: 91.50%; (non-O)
accuracy: 92.28%; precision: 89.00%; recall: 89.90%; FB1: 89.45
LOC: precision: 84.04%; recall: 91.09%; FB1: 87.42 426
ORG: precision: 90.56%; recall: 87.97%; FB1: 89.24 339
PER: precision: 93.73%; recall: 90.49%; FB1: 92.08 335
###############################
lang: de
precision recall f1-score support
B-LOC 0.86 0.89 0.87 4961
I-ORG 0.88 0.87 0.87 6043
I-LOC 0.80 0.80 0.80 2289
I-PER 0.96 0.94 0.95 6792
B-ORG 0.82 0.79 0.81 4157
B-PER 0.95 0.92 0.94 4750
micro avg 0.89 0.88 0.89 28992
macro avg 0.88 0.87 0.87 28992
weighted avg 0.89 0.88 0.89 28992
processed 97646 tokens with 13868 phrases; found: 13738 phrases; correct: 11809.
accuracy: 88.27%; (non-O)
accuracy: 95.84%; precision: 85.96%; recall: 85.15%; FB1: 85.55
LOC: precision: 84.20%; recall: 87.32%; FB1: 85.73 5145
ORG: precision: 79.33%; recall: 76.52%; FB1: 77.90 4010
PER: precision: 93.74%; recall: 90.44%; FB1: 92.06 4583
###############################
lang: el
precision recall f1-score support
B-LOC 0.88 0.91 0.89 4476
I-ORG 0.89 0.88 0.89 6685
I-LOC 0.72 0.76 0.74 1919
I-PER 0.91 0.94 0.92 5392
B-ORG 0.88 0.83 0.86 3655
B-PER 0.91 0.93 0.92 4032
micro avg 0.88 0.89 0.88 26159
macro avg 0.86 0.88 0.87 26159
weighted avg 0.88 0.89 0.88 26159
processed 90666 tokens with 12164 phrases; found: 12254 phrases; correct: 10675.
accuracy: 89.03%; (non-O)
accuracy: 95.89%; precision: 87.11%; recall: 87.76%; FB1: 87.44
LOC: precision: 86.10%; recall: 89.57%; FB1: 87.80 4656
ORG: precision: 86.00%; recall: 81.34%; FB1: 83.61 3457
PER: precision: 89.18%; recall: 91.57%; FB1: 90.36 4141
###############################
lang: en
precision recall f1-score support
B-LOC 0.82 0.87 0.84 4657
I-ORG 0.83 0.85 0.84 11607
I-LOC 0.86 0.73 0.79 6447
I-PER 0.89 0.88 0.88 7480
B-ORG 0.82 0.77 0.79 4745
B-PER 0.90 0.91 0.91 4556
micro avg 0.85 0.84 0.84 39492
macro avg 0.85 0.83 0.84 39492
weighted avg 0.85 0.84 0.84 39492
processed 80326 tokens with 13958 phrases; found: 13975 phrases; correct: 11183.
accuracy: 83.53%; (non-O)
accuracy: 90.98%; precision: 80.02%; recall: 80.12%; FB1: 80.07
LOC: precision: 75.34%; recall: 79.30%; FB1: 77.27 4902
ORG: precision: 76.77%; recall: 71.80%; FB1: 74.20 4438
PER: precision: 88.09%; recall: 89.62%; FB1: 88.85 4635
###############################
lang: es
precision recall f1-score support
B-LOC 0.92 0.92 0.92 4725
I-ORG 0.89 0.92 0.91 11371
I-LOC 0.86 0.86 0.86 6601
I-PER 0.95 0.91 0.93 7004
B-ORG 0.88 0.88 0.88 3576
B-PER 0.95 0.93 0.94 3959
micro avg 0.91 0.91 0.91 37236
macro avg 0.91 0.90 0.91 37236
weighted avg 0.91 0.91 0.91 37236
processed 64727 tokens with 12260 phrases; found: 12210 phrases; correct: 11032.
accuracy: 90.55%; (non-O)
accuracy: 94.06%; precision: 90.35%; recall: 89.98%; FB1: 90.17
LOC: precision: 90.16%; recall: 90.73%; FB1: 90.44 4755
ORG: precision: 86.45%; recall: 86.35%; FB1: 86.40 3572
PER: precision: 94.18%; recall: 92.37%; FB1: 93.27 3883
###############################
lang: et
precision recall f1-score support
B-LOC 0.91 0.94 0.92 5888
I-ORG 0.90 0.88 0.89 5731
I-LOC 0.84 0.85 0.85 2467
I-PER 0.96 0.94 0.95 5471
B-ORG 0.89 0.82 0.86 3875
B-PER 0.95 0.95 0.95 4129
micro avg 0.92 0.90 0.91 27561
macro avg 0.91 0.90 0.90 27561
weighted avg 0.92 0.90 0.91 27561
processed 80485 tokens with 13892 phrases; found: 13760 phrases; correct: 12281.
accuracy: 90.48%; (non-O)
accuracy: 96.05%; precision: 89.25%; recall: 88.40%; FB1: 88.83
LOC: precision: 88.42%; recall: 91.53%; FB1: 89.94 6095
ORG: precision: 85.56%; recall: 78.89%; FB1: 82.09 3573
PER: precision: 93.72%; recall: 92.88%; FB1: 93.30 4092
###############################
lang: eu
precision recall f1-score support
B-LOC 0.91 0.94 0.93 5682
I-ORG 0.91 0.84 0.87 5560
I-LOC 0.79 0.89 0.84 2876
I-PER 0.95 0.94 0.94 5449
B-ORG 0.91 0.81 0.86 3669
B-PER 0.94 0.93 0.93 4108
micro avg 0.91 0.90 0.90 27344
macro avg 0.90 0.89 0.90 27344
weighted avg 0.91 0.90 0.90 27344
processed 90661 tokens with 13459 phrases; found: 13219 phrases; correct: 11812.
accuracy: 89.68%; (non-O)
accuracy: 96.37%; precision: 89.36%; recall: 87.76%; FB1: 88.55
LOC: precision: 88.89%; recall: 91.99%; FB1: 90.42 5880
ORG: precision: 87.56%; recall: 78.30%; FB1: 82.68 3281
PER: precision: 91.47%; recall: 90.36%; FB1: 90.91 4058
###############################
lang: fa
precision recall f1-score support
B-LOC 0.91 0.92 0.92 3663
I-ORG 0.94 0.96 0.95 13255
I-LOC 0.92 0.93 0.92 8547
I-PER 0.94 0.92 0.93 7900
B-ORG 0.91 0.91 0.91 3535
B-PER 0.93 0.91 0.92 3544
micro avg 0.93 0.93 0.93 40444
macro avg 0.93 0.92 0.92 40444
weighted avg 0.93 0.93 0.93 40444
processed 59491 tokens with 10742 phrases; found: 10702 phrases; correct: 9699.
accuracy: 93.10%; (non-O)
accuracy: 94.77%; precision: 90.63%; recall: 90.29%; FB1: 90.46
LOC: precision: 89.42%; recall: 90.66%; FB1: 90.04 3714
ORG: precision: 90.29%; recall: 89.99%; FB1: 90.14 3523
PER: precision: 92.27%; recall: 90.21%; FB1: 91.23 3465
###############################
lang: fi
precision recall f1-score support
B-LOC 0.89 0.92 0.90 5629
I-ORG 0.90 0.89 0.90 5522
I-LOC 0.69 0.75 0.72 1096
I-PER 0.96 0.96 0.96 5437
B-ORG 0.88 0.82 0.85 4180
B-PER 0.95 0.95 0.95 4745
micro avg 0.91 0.90 0.91 26609
macro avg 0.88 0.88 0.88 26609
weighted avg 0.91 0.90 0.91 26609
processed 83660 tokens with 14554 phrases; found: 14403 phrases; correct: 12760.
accuracy: 90.35%; (non-O)
accuracy: 96.03%; precision: 88.59%; recall: 87.67%; FB1: 88.13
LOC: precision: 86.57%; recall: 88.99%; FB1: 87.76 5786
ORG: precision: 84.51%; recall: 78.47%; FB1: 81.38 3881
PER: precision: 94.40%; recall: 94.23%; FB1: 94.31 4736
###############################
lang: fr
precision recall f1-score support
B-LOC 0.90 0.89 0.89 4985
I-ORG 0.87 0.91 0.89 10386
I-LOC 0.84 0.85 0.85 5859
I-PER 0.93 0.89 0.91 6528
B-ORG 0.86 0.86 0.86 3885
B-PER 0.95 0.93 0.94 4499
micro avg 0.89 0.89 0.89 36142
macro avg 0.89 0.89 0.89 36142
weighted avg 0.89 0.89 0.89 36142
processed 68754 tokens with 13369 phrases; found: 13165 phrases; correct: 11668.
accuracy: 89.13%; (non-O)
accuracy: 93.40%; precision: 88.63%; recall: 87.28%; FB1: 87.95
LOC: precision: 87.65%; recall: 86.26%; FB1: 86.95 4906
ORG: precision: 83.51%; recall: 82.88%; FB1: 83.19 3856
PER: precision: 94.21%; recall: 92.20%; FB1: 93.19 4403
###############################
lang: he
precision recall f1-score support
B-LOC 0.86 0.83 0.84 5160
I-ORG 0.79 0.82 0.80 6907
I-LOC 0.78 0.77 0.77 3133
I-PER 0.87 0.90 0.88 6816
B-ORG 0.79 0.74 0.76 4142
B-PER 0.85 0.87 0.86 4396
micro avg 0.83 0.83 0.83 30554
macro avg 0.82 0.82 0.82 30554
weighted avg 0.83 0.83 0.83 30554
processed 85418 tokens with 13698 phrases; found: 13352 phrases; correct: 10645.
accuracy: 83.01%; (non-O)
accuracy: 92.44%; precision: 79.73%; recall: 77.71%; FB1: 78.71
LOC: precision: 82.18%; recall: 80.00%; FB1: 81.08 5023
ORG: precision: 73.53%; recall: 68.32%; FB1: 70.83 3849
PER: precision: 82.30%; recall: 83.87%; FB1: 83.08 4480
###############################
lang: hi
precision recall f1-score support
B-LOC 0.84 0.86 0.85 414
I-ORG 0.91 0.88 0.90 1123
I-LOC 0.78 0.73 0.75 398
I-PER 0.85 0.92 0.88 598
B-ORG 0.89 0.86 0.87 364
B-PER 0.90 0.92 0.91 450
micro avg 0.87 0.87 0.87 3347
macro avg 0.86 0.86 0.86 3347
weighted avg 0.87 0.87 0.87 3347
processed 6005 tokens with 1228 phrases; found: 1239 phrases; correct: 1039.
accuracy: 87.00%; (non-O)
accuracy: 91.07%; precision: 83.86%; recall: 84.61%; FB1: 84.23
LOC: precision: 78.87%; recall: 81.16%; FB1: 80.00 426
ORG: precision: 84.94%; recall: 82.14%; FB1: 83.52 352
PER: precision: 87.64%; recall: 89.78%; FB1: 88.69 461
###############################
lang: hu
precision recall f1-score support
B-LOC 0.91 0.94 0.92 5671
I-ORG 0.89 0.91 0.90 5341
I-LOC 0.80 0.84 0.82 2404
I-PER 0.96 0.96 0.96 5501
B-ORG 0.90 0.86 0.88 3982
B-PER 0.96 0.95 0.95 4510
micro avg 0.91 0.92 0.92 27409
macro avg 0.90 0.91 0.91 27409
weighted avg 0.91 0.92 0.92 27409
processed 90302 tokens with 14163 phrases; found: 14084 phrases; correct: 12672.
accuracy: 91.85%; (non-O)
accuracy: 96.53%; precision: 89.97%; recall: 89.47%; FB1: 89.72
LOC: precision: 88.42%; recall: 90.78%; FB1: 89.58 5822
ORG: precision: 87.50%; recall: 83.12%; FB1: 85.25 3783
PER: precision: 94.08%; recall: 93.44%; FB1: 93.76 4479
###############################
lang: id
precision recall f1-score support
B-LOC 0.92 0.95 0.94 3745
I-ORG 0.91 0.93 0.92 8584
I-LOC 0.95 0.96 0.96 7809
I-PER 0.95 0.92 0.93 6520
B-ORG 0.91 0.89 0.90 3733
B-PER 0.94 0.93 0.93 3969
micro avg 0.93 0.93 0.93 34360
macro avg 0.93 0.93 0.93 34360
weighted avg 0.93 0.93 0.93 34360
processed 61834 tokens with 11447 phrases; found: 11423 phrases; correct: 10383.
accuracy: 93.31%; (non-O)
accuracy: 95.58%; precision: 90.90%; recall: 90.70%; FB1: 90.80
LOC: precision: 90.82%; recall: 93.56%; FB1: 92.17 3858
ORG: precision: 88.71%; recall: 86.93%; FB1: 87.81 3658
PER: precision: 93.01%; recall: 91.56%; FB1: 92.28 3907
###############################
lang: it
precision recall f1-score support
B-LOC 0.91 0.89 0.90 4820
I-ORG 0.89 0.91 0.90 9222
I-LOC 0.85 0.83 0.84 4366
I-PER 0.94 0.94 0.94 5794
B-ORG 0.89 0.87 0.88 4087
B-PER 0.96 0.96 0.96 4842
micro avg 0.91 0.90 0.90 33131
macro avg 0.90 0.90 0.90 33131
weighted avg 0.91 0.90 0.90 33131
processed 80871 tokens with 13749 phrases; found: 13514 phrases; correct: 12168.
accuracy: 90.32%; (non-O)
accuracy: 95.39%; precision: 90.04%; recall: 88.50%; FB1: 89.26
LOC: precision: 89.17%; recall: 86.62%; FB1: 87.88 4682
ORG: precision: 85.45%; recall: 83.31%; FB1: 84.37 3985
PER: precision: 94.66%; recall: 94.75%; FB1: 94.71 4847
###############################
lang: ja
precision recall f1-score support
B-LOC 0.74 0.77 0.76 5093
I-ORG 0.65 0.65 0.65 24814
I-LOC 0.72 0.77 0.75 17274
I-PER 0.77 0.79 0.78 21730
B-ORG 0.59 0.63 0.61 4267
B-PER 0.80 0.72 0.76 4081
micro avg 0.71 0.73 0.72 77259
macro avg 0.71 0.72 0.72 77259
weighted avg 0.71 0.73 0.72 77259
processed 306439 tokens with 13971 phrases; found: 13463 phrases; correct: 9267.
accuracy: 72.88%; (non-O)
accuracy: 88.72%; precision: 68.83%; recall: 66.33%; FB1: 67.56
LOC: precision: 71.74%; recall: 73.76%; FB1: 72.74 5298
ORG: precision: 59.48%; recall: 57.56%; FB1: 58.50 4514
PER: precision: 76.17%; recall: 66.96%; FB1: 71.27 3651
###############################
lang: jv
precision recall f1-score support
B-LOC 0.85 0.85 0.85 52
I-ORG 0.84 0.89 0.87 66
I-LOC 0.78 0.93 0.85 43
I-PER 0.93 0.95 0.94 44
B-ORG 0.79 0.78 0.78 40
B-PER 0.92 0.96 0.94 25
micro avg 0.85 0.89 0.87 270
macro avg 0.85 0.89 0.87 270
weighted avg 0.85 0.89 0.87 270
processed 678 tokens with 117 phrases; found: 117 phrases; correct: 95.
accuracy: 88.89%; (non-O)
accuracy: 92.92%; precision: 81.20%; recall: 81.20%; FB1: 81.20
LOC: precision: 78.85%; recall: 78.85%; FB1: 78.85 52
ORG: precision: 79.49%; recall: 77.50%; FB1: 78.48 39
PER: precision: 88.46%; recall: 92.00%; FB1: 90.20 26
###############################
lang: ka
precision recall f1-score support
B-LOC 0.86 0.90 0.88 5288
I-ORG 0.92 0.89 0.90 7800
I-LOC 0.76 0.84 0.80 2191
I-PER 0.91 0.95 0.93 4666
B-ORG 0.89 0.76 0.82 3807
B-PER 0.88 0.92 0.90 3962
micro avg 0.88 0.88 0.88 27714
macro avg 0.87 0.88 0.87 27714
weighted avg 0.88 0.88 0.88 27714
processed 81921 tokens with 13057 phrases; found: 12917 phrases; correct: 10903.
accuracy: 88.35%; (non-O)
accuracy: 94.72%; precision: 84.41%; recall: 83.50%; FB1: 83.95
LOC: precision: 83.37%; recall: 86.82%; FB1: 85.06 5507
ORG: precision: 84.01%; recall: 72.31%; FB1: 77.72 3277
PER: precision: 86.11%; recall: 89.83%; FB1: 87.93 4133
###############################
lang: kk
precision recall f1-score support
B-LOC 0.73 0.97 0.83 383
I-ORG 0.92 0.84 0.88 592
I-LOC 0.55 0.64 0.59 210
I-PER 0.90 0.97 0.93 466
B-ORG 0.86 0.64 0.74 355
B-PER 0.90 0.91 0.91 377
micro avg 0.83 0.85 0.84 2383
macro avg 0.81 0.83 0.81 2383
weighted avg 0.84 0.85 0.84 2383
processed 7936 tokens with 1115 phrases; found: 1157 phrases; correct: 858.
accuracy: 85.14%; (non-O)
accuracy: 93.47%; precision: 74.16%; recall: 76.95%; FB1: 75.53
LOC: precision: 61.06%; recall: 81.46%; FB1: 69.80 511
ORG: precision: 80.68%; recall: 60.00%; FB1: 68.82 264
PER: precision: 87.17%; recall: 88.33%; FB1: 87.75 382
###############################
lang: ko
precision recall f1-score support
B-LOC 0.88 0.91 0.89 5855
I-ORG 0.83 0.85 0.84 5437
I-LOC 0.83 0.88 0.85 2712
I-PER 0.87 0.88 0.88 3468
B-ORG 0.84 0.77 0.80 4319
B-PER 0.87 0.83 0.85 4249
micro avg 0.86 0.85 0.85 26040
macro avg 0.85 0.85 0.85 26040
weighted avg 0.86 0.85 0.85 26040
processed 80838 tokens with 14423 phrases; found: 14035 phrases; correct: 11713.
accuracy: 85.26%; (non-O)
accuracy: 93.54%; precision: 83.46%; recall: 81.21%; FB1: 82.32
LOC: precision: 85.66%; recall: 88.76%; FB1: 87.18 6067
ORG: precision: 78.26%; recall: 71.43%; FB1: 74.69 3942
PER: precision: 85.22%; recall: 80.75%; FB1: 82.92 4026
###############################
lang: ml
precision recall f1-score support
B-LOC 0.82 0.86 0.84 443
I-ORG 0.90 0.88 0.89 774
I-LOC 0.80 0.75 0.77 219
I-PER 0.89 0.93 0.91 492
B-ORG 0.86 0.77 0.81 354
B-PER 0.87 0.89 0.88 407
micro avg 0.87 0.86 0.87 2689
macro avg 0.86 0.85 0.85 2689
weighted avg 0.87 0.86 0.86 2689
processed 6727 tokens with 1204 phrases; found: 1195 phrases; correct: 974.
accuracy: 86.31%; (non-O)
accuracy: 92.81%; precision: 81.51%; recall: 80.90%; FB1: 81.20
LOC: precision: 79.00%; recall: 82.39%; FB1: 80.66 462
ORG: precision: 80.57%; recall: 71.47%; FB1: 75.75 314
PER: precision: 84.96%; recall: 87.47%; FB1: 86.20 419
###############################
lang: mr
precision recall f1-score support
B-LOC 0.85 0.86 0.86 525
I-ORG 0.89 0.93 0.91 852
I-LOC 0.71 0.67 0.69 258
I-PER 0.91 0.92 0.92 598
B-ORG 0.87 0.80 0.83 364
B-PER 0.86 0.92 0.89 375
micro avg 0.87 0.88 0.87 2972
macro avg 0.85 0.85 0.85 2972
weighted avg 0.87 0.88 0.87 2972
processed 7356 tokens with 1264 phrases; found: 1267 phrases; correct: 1066.
accuracy: 87.69%; (non-O)
accuracy: 93.50%; precision: 84.14%; recall: 84.34%; FB1: 84.24
LOC: precision: 83.27%; recall: 84.38%; FB1: 83.82 532
ORG: precision: 84.78%; recall: 78.02%; FB1: 81.26 335
PER: precision: 84.75%; recall: 90.40%; FB1: 87.48 400
###############################
lang: ms
precision recall f1-score support
B-LOC 0.94 0.98 0.96 367
I-ORG 0.90 0.91 0.91 913
I-LOC 0.96 0.98 0.97 898
I-PER 0.91 0.90 0.91 555
B-ORG 0.90 0.86 0.88 375
B-PER 0.91 0.92 0.92 373
micro avg 0.92 0.93 0.93 3481
macro avg 0.92 0.93 0.92 3481
weighted avg 0.92 0.93 0.93 3481
processed 5874 tokens with 1115 phrases; found: 1120 phrases; correct: 1010.
accuracy: 93.08%; (non-O)
accuracy: 94.91%; precision: 90.18%; recall: 90.58%; FB1: 90.38
LOC: precision: 93.72%; recall: 97.55%; FB1: 95.59 382
ORG: precision: 87.22%; recall: 83.73%; FB1: 85.44 360
PER: precision: 89.42%; recall: 90.62%; FB1: 90.01 378
###############################
lang: my
precision recall f1-score support
B-LOC 0.61 0.93 0.74 56
I-ORG 0.87 0.71 0.78 68
I-LOC 0.15 1.00 0.26 4
I-PER 0.85 0.63 0.72 46
B-ORG 0.90 0.58 0.70 33
B-PER 0.90 0.60 0.72 30
micro avg 0.70 0.72 0.71 237
macro avg 0.72 0.74 0.65 237
weighted avg 0.80 0.72 0.73 237
processed 756 tokens with 119 phrases; found: 126 phrases; correct: 83.
accuracy: 71.73%; (non-O)
accuracy: 86.77%; precision: 65.87%; recall: 69.75%; FB1: 67.76
LOC: precision: 60.00%; recall: 91.07%; FB1: 72.34 85
ORG: precision: 80.95%; recall: 51.52%; FB1: 62.96 21
PER: precision: 75.00%; recall: 50.00%; FB1: 60.00 20
###############################
lang: nl
precision recall f1-score support
B-LOC 0.89 0.93 0.91 5133
I-ORG 0.90 0.88 0.89 6693
I-LOC 0.86 0.86 0.86 3662
I-PER 0.95 0.94 0.95 6371
B-ORG 0.89 0.85 0.87 3908
B-PER 0.96 0.94 0.95 4684
micro avg 0.91 0.90 0.91 30451
macro avg 0.91 0.90 0.90 30451
weighted avg 0.91 0.90 0.91 30451
processed 85122 tokens with 13725 phrases; found: 13653 phrases; correct: 12219.
accuracy: 90.42%; (non-O)
accuracy: 96.01%; precision: 89.50%; recall: 89.03%; FB1: 89.26
LOC: precision: 87.24%; recall: 90.71%; FB1: 88.94 5337
ORG: precision: 86.72%; recall: 82.19%; FB1: 84.39 3704
PER: precision: 94.34%; recall: 92.89%; FB1: 93.61 4612
###############################
lang: pt
precision recall f1-score support
B-LOC 0.91 0.92 0.92 4779
I-ORG 0.89 0.92 0.91 10542
I-LOC 0.88 0.89 0.88 6467
I-PER 0.96 0.92 0.94 7310
B-ORG 0.89 0.88 0.88 3753
B-PER 0.95 0.93 0.94 4291
micro avg 0.91 0.91 0.91 37142
macro avg 0.91 0.91 0.91 37142
weighted avg 0.91 0.91 0.91 37142
processed 63647 tokens with 12823 phrases; found: 12725 phrases; correct: 11471.
accuracy: 91.00%; (non-O)
accuracy: 94.15%; precision: 90.15%; recall: 89.46%; FB1: 89.80
LOC: precision: 89.40%; recall: 90.86%; FB1: 90.12 4857
ORG: precision: 86.02%; recall: 84.95%; FB1: 85.48 3706
PER: precision: 94.69%; recall: 91.84%; FB1: 93.25 4162
###############################
lang: ru
precision recall f1-score support
B-LOC 0.88 0.90 0.89 4560
I-ORG 0.89 0.86 0.88 8008
I-LOC 0.83 0.86 0.84 3060
I-PER 0.95 0.97 0.96 7544
B-ORG 0.88 0.80 0.84 4074
B-PER 0.92 0.96 0.94 3543
micro avg 0.90 0.90 0.90 30789
macro avg 0.89 0.89 0.89 30789
weighted avg 0.90 0.90 0.90 30789
processed 71288 tokens with 12177 phrases; found: 12036 phrases; correct: 10465.
accuracy: 89.74%; (non-O)
accuracy: 94.44%; precision: 86.95%; recall: 85.94%; FB1: 86.44
LOC: precision: 86.28%; recall: 88.53%; FB1: 87.39 4679
ORG: precision: 83.80%; recall: 75.41%; FB1: 79.38 3666
PER: precision: 90.92%; recall: 94.72%; FB1: 92.78 3691
###############################
lang: sw
precision recall f1-score support
B-LOC 0.83 0.94 0.88 388
I-ORG 0.86 0.79 0.82 763
I-LOC 0.76 0.88 0.81 568
I-PER 0.95 0.95 0.95 744
B-ORG 0.91 0.82 0.86 374
B-PER 0.95 0.95 0.95 432
micro avg 0.87 0.88 0.88 3269
macro avg 0.88 0.89 0.88 3269
weighted avg 0.88 0.88 0.88 3269
processed 5786 tokens with 1194 phrases; found: 1209 phrases; correct: 1042.
accuracy: 88.35%; (non-O)
accuracy: 92.02%; precision: 86.19%; recall: 87.27%; FB1: 86.72
LOC: precision: 77.63%; recall: 87.63%; FB1: 82.32 438
ORG: precision: 88.17%; recall: 79.68%; FB1: 83.71 338
PER: precision: 93.30%; recall: 93.52%; FB1: 93.41 433
###############################
lang: ta
precision recall f1-score support
B-LOC 0.82 0.86 0.84 436
I-ORG 0.84 0.87 0.85 814
I-LOC 0.76 0.68 0.72 239
I-PER 0.90 0.94 0.92 615
B-ORG 0.82 0.77 0.79 383
B-PER 0.86 0.90 0.88 422
micro avg 0.84 0.86 0.85 2909
macro avg 0.83 0.84 0.83 2909
weighted avg 0.84 0.86 0.85 2909
processed 7234 tokens with 1241 phrases; found: 1252 phrases; correct: 1006.
accuracy: 85.87%; (non-O)
accuracy: 92.31%; precision: 80.35%; recall: 81.06%; FB1: 80.71
LOC: precision: 80.04%; recall: 83.72%; FB1: 81.84 456
ORG: precision: 76.26%; recall: 71.28%; FB1: 73.68 358
PER: precision: 84.02%; recall: 87.20%; FB1: 85.58 438
###############################
lang: te
precision recall f1-score support
B-LOC 0.77 0.89 0.82 450
I-ORG 0.87 0.79 0.83 633
I-LOC 0.67 0.82 0.74 178
I-PER 0.80 0.86 0.83 294
B-ORG 0.75 0.70 0.72 340
B-PER 0.79 0.80 0.80 381
micro avg 0.79 0.81 0.80 2276
macro avg 0.78 0.81 0.79 2276
weighted avg 0.80 0.81 0.80 2276
processed 8155 tokens with 1171 phrases; found: 1226 phrases; correct: 890.
accuracy: 81.06%; (non-O)
accuracy: 92.36%; precision: 72.59%; recall: 76.00%; FB1: 74.26
LOC: precision: 73.04%; recall: 84.89%; FB1: 78.52 523
ORG: precision: 69.09%; recall: 64.41%; FB1: 66.67 317
PER: precision: 74.87%; recall: 75.85%; FB1: 75.36 386
###############################
lang: th
precision recall f1-score support
B-LOC 0.75 0.72 0.74 6430
I-ORG 0.67 0.71 0.69 56669
I-LOC 0.75 0.80 0.77 47216
I-PER 0.76 0.78 0.77 57226
B-ORG 0.55 0.53 0.54 5136
B-PER 0.45 0.62 0.53 5297
micro avg 0.71 0.75 0.73 177974
macro avg 0.66 0.69 0.67 177974
weighted avg 0.71 0.75 0.73 177974
processed 626147 tokens with 20775 phrases; found: 18317 phrases; correct: 12096.
accuracy: 74.70%; (non-O)
accuracy: 88.27%; precision: 66.04%; recall: 58.22%; FB1: 61.88
LOC: precision: 67.74%; recall: 63.63%; FB1: 65.62 6144
ORG: precision: 55.11%; recall: 45.64%; FB1: 49.93 4916
PER: precision: 72.00%; recall: 62.96%; FB1: 67.18 7257
###############################
lang: tl
precision recall f1-score support
B-LOC 0.87 0.90 0.88 327
I-ORG 0.88 0.93 0.90 1045
I-LOC 0.93 0.85 0.89 706
I-PER 0.91 0.89 0.90 813
B-ORG 0.87 0.90 0.89 341
B-PER 0.90 0.90 0.90 366
micro avg 0.90 0.90 0.90 3598
macro avg 0.89 0.90 0.90 3598
weighted avg 0.90 0.90 0.90 3598
processed 4627 tokens with 1034 phrases; found: 1057 phrases; correct: 908.
accuracy: 89.88%; (non-O)
accuracy: 91.16%; precision: 85.90%; recall: 87.81%; FB1: 86.85
LOC: precision: 81.82%; recall: 85.32%; FB1: 83.53 341
ORG: precision: 86.04%; recall: 88.56%; FB1: 87.28 351
PER: precision: 89.59%; recall: 89.34%; FB1: 89.47 365
###############################
lang: tr
precision recall f1-score support
B-LOC 0.91 0.91 0.91 4914
I-ORG 0.88 0.94 0.91 6979
I-LOC 0.85 0.83 0.84 3005
I-PER 0.95 0.94 0.94 5694
B-ORG 0.89 0.89 0.89 4154
B-PER 0.95 0.93 0.94 4519
micro avg 0.91 0.92 0.91 29265
macro avg 0.91 0.91 0.91 29265
weighted avg 0.91 0.92 0.91 29265
processed 75731 tokens with 13587 phrases; found: 13482 phrases; correct: 12147.
accuracy: 91.66%; (non-O)
accuracy: 95.91%; precision: 90.10%; recall: 89.40%; FB1: 89.75
LOC: precision: 88.93%; recall: 88.93%; FB1: 88.93 4914
ORG: precision: 87.10%; recall: 87.10%; FB1: 87.10 4154
PER: precision: 94.22%; recall: 92.03%; FB1: 93.12 4414
###############################
lang: ur
precision recall f1-score support
B-LOC 0.87 0.94 0.90 334
I-ORG 0.95 0.85 0.90 1005
I-LOC 0.86 0.96 0.91 904
I-PER 0.93 0.97 0.95 928
B-ORG 0.96 0.84 0.89 323
B-PER 0.93 0.95 0.94 363
micro avg 0.91 0.92 0.92 3857
macro avg 0.92 0.92 0.92 3857
weighted avg 0.92 0.92 0.92 3857
processed 5027 tokens with 1020 phrases; found: 1017 phrases; correct: 916.
accuracy: 92.20%; (non-O)
accuracy: 93.06%; precision: 90.07%; recall: 89.80%; FB1: 89.94
LOC: precision: 85.08%; recall: 92.22%; FB1: 88.51 362
ORG: precision: 95.74%; recall: 83.59%; FB1: 89.26 282
PER: precision: 90.62%; recall: 93.11%; FB1: 91.85 373
###############################
lang: vi
precision recall f1-score support
B-LOC 0.89 0.92 0.91 3717
I-ORG 0.90 0.92 0.91 13562
I-LOC 0.90 0.91 0.90 8018
I-PER 0.92 0.91 0.92 7787
B-ORG 0.90 0.86 0.88 3704
B-PER 0.92 0.93 0.93 3884
micro avg 0.91 0.91 0.91 40672
macro avg 0.91 0.91 0.91 40672
weighted avg 0.91 0.91 0.91 40672
processed 64967 tokens with 11305 phrases; found: 11317 phrases; correct: 9984.
accuracy: 91.01%; (non-O)
accuracy: 93.30%; precision: 88.22%; recall: 88.31%; FB1: 88.27
LOC: precision: 86.64%; recall: 90.23%; FB1: 88.40 3871
ORG: precision: 86.90%; recall: 83.26%; FB1: 85.04 3549
PER: precision: 90.99%; recall: 91.30%; FB1: 91.15 3897
###############################
lang: yo
precision recall f1-score support
B-LOC 0.55 0.72 0.62 39
I-ORG 0.53 0.23 0.32 87
I-LOC 0.68 0.83 0.75 72
I-PER 0.82 0.66 0.73 71
B-ORG 0.50 0.28 0.36 29
B-PER 0.85 0.79 0.82 43
micro avg 0.68 0.58 0.62 341
macro avg 0.66 0.58 0.60 341
weighted avg 0.66 0.58 0.60 341
processed 503 tokens with 111 phrases; found: 107 phrases; correct: 66.
accuracy: 57.77%; (non-O)
accuracy: 71.17%; precision: 61.68%; recall: 59.46%; FB1: 60.55
LOC: precision: 52.94%; recall: 69.23%; FB1: 60.00 51
ORG: precision: 50.00%; recall: 27.59%; FB1: 35.56 16
PER: precision: 77.50%; recall: 72.09%; FB1: 74.70 40
###############################
lang: zh
precision recall f1-score support
B-LOC 0.76 0.84 0.80 4371
I-ORG 0.77 0.76 0.77 17399
I-LOC 0.78 0.86 0.82 12282
I-PER 0.87 0.87 0.87 12897
B-ORG 0.70 0.71 0.70 3779
B-PER 0.86 0.82 0.84 3899
micro avg 0.80 0.82 0.81 54627
macro avg 0.79 0.81 0.80 54627
weighted avg 0.80 0.82 0.81 54627
processed 207418 tokens with 12532 phrases; found: 12410 phrases; correct: 9536.
accuracy: 81.65%; (non-O)
accuracy: 91.91%; precision: 76.84%; recall: 76.09%; FB1: 76.47
LOC: precision: 74.87%; recall: 80.78%; FB1: 77.71 4827
ORG: precision: 71.48%; recall: 66.88%; FB1: 69.10 3850
PER: precision: 84.92%; recall: 80.40%; FB1: 82.60 3733