Pretrained Pipelines#

Spark NLP offers a variety of pretrained pipelines that will help you get started, and get a sense of how the library works. We are constantly working on improving the available content.

Downloading and using a pretrained pipeline#

In this example, Explain Document ML ("explain_document_ml") is a pretrained pipeline that does a little bit of everything NLP related.

Pretrained Pipelines can be used as a Spark ML Pipeline or a Spark NLP Light pipeline.

Note that the first time you run the below code it might take longer since it downloads the pretrained pipeline from our servers!

>>> from sparknlp.pretrained import PretrainedPipeline
>>> explain_document_pipeline = PretrainedPipeline("explain_document_ml")
explain_document_ml download started this may take some time.
Approx size to download 9.1 MB
[OK!]

As a Spark ML Pipeline#

>>> data = spark.createDataFrame([["We are very happy about Spark NLP"]]).toDF("text")
>>> result = explain_document_pipeline.model.transform(data).selectExpr("explode(pos)")
>>> result.show()
+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|                text|            document|            sentence|               token|               spell|              lemmas|               stems|                 pos|
+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|We are very happy...|[[document, 0, 32...|[[document, 0, 32...|[[token, 0, 1, We...|[[token, 0, 1, We...|[[token, 0, 1, We...|[[token, 0, 1, we...|[[pos, 0, 1, PRP,...|
+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+

As a Spark NLP LightPipeline#

>>> explain_document_pipeline.annotate("We are very happy about Spark NLP")
{'document': ['We are very happy about Spark NLP'],
 'lemmas': ['We', 'be', 'very', 'happy', 'about', 'Spark', 'NLP'],
 'pos': ['PRP', 'VBP', 'RB', 'JJ', 'IN', 'NNP', 'NNP'],
 'sentence': ['We are very happy about Spark NLP'],
 'spell': ['We', 'are', 'very', 'happy', 'about', 'Spark', 'NLP'],
 'stems': ['we', 'ar', 'veri', 'happi', 'about', 'spark', 'nlp'],
 'token': ['We', 'are', 'very', 'happy', 'about', 'Spark', 'NLP']}

Available Pipelines#

Please see the Pipelines Page for all available pipelines.

Alternatively you can also check for pretrained pipelines using ResourceDownloader.showPublicPipelines().

>>> ResourceDownloader.showPublicPipelines("en")
+------------------+------+---------+
| Pipeline         | lang | version |
+------------------+------+---------+
| dependency_parse | en   | 2.0.2   |
| check_spelling   | en   | 2.1.0   |
| match_datetime   | en   | 2.1.0   |
|  ...             | ...  | ...     |