sparknlp.annotator.stemmer#

Contains classes for the Stemmer.

Module Contents#

Classes#

Stemmer

Returns hard-stems out of words with the objective of retrieving the

class Stemmer[source]#

Returns hard-stems out of words with the objective of retrieving the meaningful part of the word.

For extended examples of usage, see the Examples.

Input Annotation types

Output Annotation type

TOKEN

TOKEN

Parameters:
None

Examples

>>> import sparknlp
>>> from sparknlp.base import *
>>> from sparknlp.annotator import *
>>> from pyspark.ml import Pipeline
>>> documentAssembler = DocumentAssembler() \
...     .setInputCol("text") \
...     .setOutputCol("document")
>>> tokenizer = Tokenizer() \
...     .setInputCols(["document"]) \
...     .setOutputCol("token")
>>> stemmer = Stemmer() \
...     .setInputCols(["token"]) \
...     .setOutputCol("stem")
>>> pipeline = Pipeline().setStages([
...     documentAssembler,
...     tokenizer,
...     stemmer
... ])
>>> data = spark.createDataFrame([["Peter Pipers employees are picking pecks of pickled peppers."]]) \
...     .toDF("text")
>>> result = pipeline.fit(data).transform(data)
>>> result.selectExpr("stem.result").show(truncate = False)
+-------------------------------------------------------------+
|result                                                       |
+-------------------------------------------------------------+
|[peter, piper, employe, ar, pick, peck, of, pickl, pepper, .]|
+-------------------------------------------------------------+