Description
Model for Catalan language processing based on models by Barcelona SuperComputing Center and the AINA project (Generalitat de Catalunya), following POS and tokenization guidelines from ANCORA Universal Dependencies corpus.
Open in Colab Download Copy S3 URI
How to use
pipeline = PretrainedPipeline("pipeline_md", "ca", "@cayorodriguez")
result = pipeline.annotate("El català ja és a SparkNLP.")
Results
{'chunk': ['El català ja', 'SparkNLP', 'és'],
'entities': ['SparkNLP'],
'lemma': ['el', 'català', 'ja', 'ser', 'a', 'sparknlp', '.'],
'document': ['El català ja es a SparkNLP.'],
'pos': ['DET', 'NOUN', 'ADV', 'AUX', 'ADP', 'PROPN', 'PUNCT'],
'sentence_embeddings': ['El català ja és a SparkNLP.'],
'cleanTokens': ['català', 'SparkNLP', '.'],
'token': ['El', 'català', 'ja', 'és', 'a', 'SparkNLP', '.'],
'ner': ['O', 'O', 'O', 'O', 'O', 'B-ORG', 'O'],
'embeddings': ['El', 'català', 'ja', 'és', 'a', 'SparkNLP', '.'],
'form': ['el', 'català', 'ja', 'és', 'a', 'sparknlp', '.'],
'sentence': ['El català ja és a SparkNLP.']}
Model Information
Model Name: | pipeline_md |
Type: | pipeline |
Compatibility: | Spark NLP 3.4.4+ |
License: | Open Source |
Edition: | Community |
Language: | ca |
Size: | 756.1 MB |
Included Models
- DocumentAssembler
- SentenceDetectorDLModel
- TokenizerModel
- NormalizerModel
- StopWordsCleaner
- RoBertaEmbeddings
- SentenceEmbeddings
- EmbeddingsFinisher
- LemmatizerModel
- PerceptronModel
- RoBertaForTokenClassification
- NerConverter
- Chunker