Description
This model was imported from Hugging Face, and has been fine-tuned on Universal Dependencies Lassy dataset for Dutch language, leveraging Bert embeddings and BertForTokenClassification for NER purposes.
Predicted Entities
CARDINAL, DATE, EVENT, FAC, GPE, LANGUAGE, LAW, LOC, MONEY, NORP, ORDINAL, ORG, PERCENT, PERSON, PRODUCT, QUANTITY, TIME, WORK_OF_ART
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_dutch_udlassy_ner", "nl"))\
.setInputCols(["sentence",'token'])\
.setOutputCol("ner")
ner_converter = NerConverter()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier, ner_converter])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
text = """Mijn naam is Peter Fergusson. Ik woon sinds oktober 2011 in New York en werk 5 jaar bij Tesla Motor."""
result = model.transform(spark.createDataFrame([[text]]).toDF("text"))
val documentAssembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")
.setInputCols(Array("document"))
.setOutputCol("sentence")
val tokenizer = Tokenizer()
.setInputCols(Array("sentence"))
.setOutputCol("token")
val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_dutch_udlassy_ner", "nl"))
.setInputCols(Array("sentence","token"))
.setOutputCol("ner")
ner_converter = NerConverter()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer, tokenClassifier, ner_converter))
val example = Seq.empty["Mijn naam is Peter Fergusson. Ik woon sinds oktober 2011 in New York en werk 5 jaar bij Tesla Motor."].toDS.toDF("text")
val result = pipeline.fit(example).transform(example)
import nlu
nlu.load("nl.ner.bert").predict("""Mijn naam is Peter Fergusson. Ik woon sinds oktober 2011 in New York en werk 5 jaar bij Tesla Motor.""")
Results
+------------------------+---------+
|chunk |ner_label|
+------------------------+---------+
|Peter Fergusson |PERSON |
|oktober 2011 |DATE |
|New York |GPE |
|5 jaar |DATE |
|Tesla Motor |ORG |
+------------------------+---------+
Model Information
| Model Name: | bert_token_classifier_dutch_udlassy_ner |
| Compatibility: | Spark NLP 3.3.2+ |
| License: | Open Source |
| Edition: | Official |
| Input Labels: | [sentence, token] |
| Output Labels: | [ner] |
| Language: | nl |
| Case sensitive: | true |
| Max sentense length: | 256 |
Data Source
https://huggingface.co/wietsedv/bert-base-dutch-cased-finetuned-udlassy-ner