sparknlp.annotator.stemmer
#
Contains classes for the Stemmer.
Module Contents#
Classes#
Returns hard-stems out of words with the objective of retrieving the |
- class Stemmer[source]#
Returns hard-stems out of words with the objective of retrieving the meaningful part of the word.
For extended examples of usage, see the Examples.
Input Annotation types
Output Annotation type
TOKEN
TOKEN
- Parameters:
- None
Examples
>>> import sparknlp >>> from sparknlp.base import * >>> from sparknlp.annotator import * >>> from pyspark.ml import Pipeline >>> documentAssembler = DocumentAssembler() \ ... .setInputCol("text") \ ... .setOutputCol("document") >>> tokenizer = Tokenizer() \ ... .setInputCols(["document"]) \ ... .setOutputCol("token") >>> stemmer = Stemmer() \ ... .setInputCols(["token"]) \ ... .setOutputCol("stem") >>> pipeline = Pipeline().setStages([ ... documentAssembler, ... tokenizer, ... stemmer ... ]) >>> data = spark.createDataFrame([["Peter Pipers employees are picking pecks of pickled peppers."]]) \ ... .toDF("text") >>> result = pipeline.fit(data).transform(data) >>> result.selectExpr("stem.result").show(truncate = False) +-------------------------------------------------------------+ |result | +-------------------------------------------------------------+ |[peter, piper, employe, ar, pick, peck, of, pickl, pepper, .]| +-------------------------------------------------------------+