sparknlp.annotator.stemmer#
Contains classes for the Stemmer.
Module Contents#
Classes#
Returns hard-stems out of words with the objective of retrieving the |
- class Stemmer[source]#
Returns hard-stems out of words with the objective of retrieving the meaningful part of the word.
For extended examples of usage, see the Examples.
Input Annotation types
Output Annotation type
TOKENTOKEN- Parameters:
- None
Examples
>>> import sparknlp >>> from sparknlp.base import * >>> from sparknlp.annotator import * >>> from pyspark.ml import Pipeline >>> documentAssembler = DocumentAssembler() \ ... .setInputCol("text") \ ... .setOutputCol("document") >>> tokenizer = Tokenizer() \ ... .setInputCols(["document"]) \ ... .setOutputCol("token") >>> stemmer = Stemmer() \ ... .setInputCols(["token"]) \ ... .setOutputCol("stem") >>> pipeline = Pipeline().setStages([ ... documentAssembler, ... tokenizer, ... stemmer ... ]) >>> data = spark.createDataFrame([["Peter Pipers employees are picking pecks of pickled peppers."]]) \ ... .toDF("text") >>> result = pipeline.fit(data).transform(data) >>> result.selectExpr("stem.result").show(truncate = False) +-------------------------------------------------------------+ |result | +-------------------------------------------------------------+ |[peter, piper, employe, ar, pick, peck, of, pickl, pepper, .]| +-------------------------------------------------------------+