Description
Pretrained Named Entity Recognition model, uploaded to Hugging Face, adapted and imported into Spark NLP. roberta-base-bne-conll-ner_spark_nlp is a Spanish model orginally trained by TEMU-BSC for PlanTL-GOB-ES.
Predicted Entities
How to use
documentAssembler = DocumentAssembler()
.setInputCol(“text”)
.setOutputCol(“document”)
sentenceDetector = SentenceDetector()
.setInputCols([“document”])
.setOutputCol(“sentence”)
tokenizer = Tokenizer()
.setInputCols(“sentence”)
.setOutputCol(“token”)
ner = RoBertaForTokenClassification.pretrained(“roberta_base_bne_conll_ner_spark_nlp”,”es”)
.setInputCols([“sentence”, “token”])
.setOutputCol(“ner”)
pipeline = Pipeline(stages=[documentAssembler, tokenizer, ner])
data = spark.createDataFrame([[“El Plan Nacional para el Impulso de las Tecnologías del Lenguage es una iniciativa del Gobierno de España”]]).toDF(“text”)
result = pipeline.fit(data).transform(data)
documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")
sentenceDetector = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer() \
.setInputCols("sentence") \
.setOutputCol("token")
ner = RoBertaForTokenClassification.pretrained("roberta_base_bne_conll_ner_spark_nlp","es") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("ner")
pipeline = Pipeline(stages=[documentAssembler, tokenizer, ner])
data = spark.createDataFrame([["El Plan Nacional para el Impulso de las Tecnologías del Lenguage es una iniciativa del Gobierno de España"]]).toDF("text")
Model Information
Model Name: | roberta_base_bne_conll_ner_spark_nlp |
Compatibility: | Spark NLP 4.0.0+ |
License: | Open Source |
Edition: | Community |
Input Labels: | [document, token] |
Output Labels: | [ner] |
Language: | es |
Size: | 447.3 MB |
Case sensitive: | true |
Max sentence length: | 128 |