Basic NLP Pipeline for Spanish from TEMU_BSC for PlanTL

Description

Pretrained Basic NLP pipeline, by TEMU-BSC for PlanTL-GOB-ES, with Tokenization, lemmatization, NER, embeddings and Normalization, using roberta_base_bne transformer.

Download Copy S3 URI

How to use

import sparknlp spark = sparknlp.start()

from sparknlp.annotator import * from sparknlp.base import * pipeline = PretrainedPipeline(“pipeline_bsc_roberta_base_bne”, “es”, “@cayorodriguez”) from sparknlp.base import LightPipeline

light_model = LightPipeline(pipeline) text = “La Reserva Federal de el Gobierno de EE UU aprueba una de las mayorores subidas de tipos de interés desde 1994.” light_result = light_model.annotate(text)

result = pipeline.annotate(““Veo al hombre de los Estados Unidos con el telescopio””)

import sparknlp
spark = sparknlp.start()

from sparknlp.annotator import *
from sparknlp.base import *
pipeline = PretrainedPipeline("pipeline_bsc_roberta_base_bne", "es", "@cayorodriguez")
from sparknlp.base import LightPipeline

light_model = LightPipeline(pipeline)
text = "La Reserva Federal de el Gobierno de EE UU aprueba una de las mayorores subidas de tipos de interés desde 1994."
light_result = light_model.annotate(text)


result = pipeline.annotate(""Veo al hombre de los Estados Unidos con el telescopio"")

Model Information

Model Name:	pipeline_bsc_roberta_base_bne
Type:	pipeline
Compatibility:	Spark NLP 4.0.0+
License:	Open Source
Edition:	Community
Language:	es
Size:	2.0 GB
Dependencies:	roberta_base_bne

Included Models

DocumentAssembler
SentenceDetectorDLModel
TokenizerModel
NormalizerModel
StopWordsCleaner
RoBertaEmbeddings
SentenceEmbeddings
EmbeddingsFinisher
LemmatizerModel
RoBertaForTokenClassification
RoBertaForTokenClassification
NerConverter

PREVIOUSSpanish Named Entity Recognition, (RoBERTa base trained with data from the National Library of Spain (BNE) and CONLL 2003 data), by the TEMU Unit of the BSC-CNS

NEXTSpanish Legal Longformer Embeddings (8192 tokens, from mrm8488)