sparknlp.pretrained.resource_downloader#

Contains classes for the ResourceDownloader.

Module Contents#

Classes#

ResourceDownloader

Downloads and manages resources, pretrained models/pipelines.

class ResourceDownloader[source]#

Downloads and manages resources, pretrained models/pipelines.

Usually you will not need to use this class directly. It is called by the pretrained() function of annotators.

However, you can use this class to list the available pretrained resources.

Examples

If you want to list all NerDLModels for the english language you can run:

>>> ResourceDownloader.showPublicModels("NerDLModel", "en")
+-------------+------+---------+
| Model       | lang | version |
+-------------+------+---------+
| onto_100    | en   | 2.1.0   |
| onto_300    | en   | 2.1.0   |
| ner_dl_bert | en   | 2.2.0   |
|  ...        | ...  | ...     |

Similarly for Pipelines:

>>> ResourceDownloader.showPublicPipelines("en")
+------------------+------+---------+
| Pipeline         | lang | version |
+------------------+------+---------+
| dependency_parse | en   | 2.0.2   |
| check_spelling   | en   | 2.1.0   |
| match_datetime   | en   | 2.1.0   |
|  ...             | ...  | ...     |
static downloadModel(reader, name, language, remote_loc=None, j_dwn='PythonResourceDownloader')[source]#

Downloads and loads a model with the default downloader. Usually this method does not need to be called directly, as it is called by the pretrained() method of the annotator.

Parameters:
readerobj

Class to read the model for

namestr

Name of the pretrained model

languagestr

Language of the model

remote_locstr, optional

Directory of the Spark NLP Folder, by default None

j_dwnstr, optional

Which java downloader to use, by default ‘PythonResourceDownloader’

Returns:
AnnotatorModel

Loaded pretrained annotator/pipeline

static downloadModelDirectly(name, remote_loc='public/models', unzip=True)[source]#

Downloads a model directly to the cache folder. You can use to copy-paste the s3 URI from the model hub and download the model. For available s3 URI and models, please see the Models Hub. Parameters ———- name : str

Name of the model or s3 URI

remote_locstr, optional

Directory of the remote Spark NLP Folder, by default “public/models”

unzipBool, optional

Used to unzip model, by default ‘True’

static downloadPipeline(name, language, remote_loc=None)[source]#

Downloads and loads a pipeline with the default downloader.

Parameters:
namestr

Name of the pipeline

languagestr

Language of the pipeline

remote_locstr, optional

Directory of the remote Spark NLP Folder, by default None

Returns:
PipelineModel

The loaded pipeline

static clearCache(name, language, remote_loc=None)[source]#

Clears the cache entry of a model.

Parameters:
namestr

Name of the model

languageen

Language of the model

remote_locstr, optional

Directory of the remote Spark NLP Folder, by default None

static showPublicModels(annotator=None, lang=None, version=None)[source]#

Prints all pretrained models for a particular annotator model, that are compatible with a version of Spark NLP. If any of the optional arguments are not set, the filter is not considered.

Parameters:
annotatorstr, optional

Name of the annotator to filer, by default None

langstr, optional

Language of the models to filter, by default None

versionstr, optional

Version of Spark NLP to filter, by default None

static showPublicPipelines(lang=None, version=None)[source]#

Prints all pretrained models for a particular annotator model, that are compatible with a version of Spark NLP. If any of the optional arguments are not set, the filter is not considered.

Parameters:
langstr, optional

Language of the models to filter, by default None

versionstr, optional

Version of Spark NLP to filter, by default None

static showUnCategorizedResources()[source]#

Shows models or pipelines in the metadata which has not been categorized yet.

static showAvailableAnnotators()[source]#

Shows all available annotators in Spark NLP.