sparknlp.annotator.classifier_dl.mpnet_for_question_answering#

Module Contents#

Classes#

MPNetForQuestionAnswering

MPNetForQuestionAnswering can load MPNet Models with a span classification head on top for extractive

class MPNetForQuestionAnswering(classname='com.johnsnowlabs.nlp.annotators.classifier.dl.MPNetForQuestionAnswering', java_model=None)[source]#

MPNetForQuestionAnswering can load MPNet Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).

Pretrained models can be loaded with pretrained() of the companion object:

>>> spanClassifier = MPNetForQuestionAnswering.pretrained() \
...     .setInputCols(["document_question", "document_context"]) \
...     .setOutputCol("answer")

The default model is "mpnet_base_question_answering_squad2", if no name is provided.

For available pretrained models please see the Models Hub.

To see which models are compatible and how to import them see Import Transformers into Spark NLP 🚀.

Input Annotation types

Output Annotation type

DOCUMENT, DOCUMENT

CHUNK

Parameters:
batchSize

Batch size. Large values allows faster processing but requires more memory, by default 8

caseSensitive

Whether to ignore case in tokens for embeddings matching, by default False

maxSentenceLength

Max sentence length to process, by default 128

Examples

>>> import sparknlp
>>> from sparknlp.base import *
>>> from sparknlp.annotator import *
>>> from pyspark.ml import Pipeline
>>> documentAssembler = MultiDocumentAssembler() \
...     .setInputCols(["question", "context"]) \
...     .setOutputCol(["document_question", "document_context"])
>>> spanClassifier = MPNetForQuestionAnswering.pretrained() \
...     .setInputCols(["document_question", "document_context"]) \
...     .setOutputCol("answer") \
...     .setCaseSensitive(False)
>>> pipeline = Pipeline().setStages([
...     documentAssembler,
...     spanClassifier
... ])
>>> data = spark.createDataFrame([["What's my name?", "My name is Clara and I live in Berkeley."]]).toDF("question", "context")
>>> result = pipeline.fit(data).transform(data)
>>> result.select("answer.result").show(truncate=False)
+--------------------+
|result              |
+--------------------+
|[Clara]             |
+--------------------+
static loadSavedModel(folder, spark_session)[source]#

Loads a locally saved model.

Parameters:
folderstr

Folder of the saved model

spark_sessionpyspark.sql.SparkSession

The current SparkSession

Returns:
MPNetForQuestionAnswering

The restored model

static pretrained(name='mpnet_base_question_answering_squad2', lang='en', remote_loc=None)[source]#

Downloads and loads a pretrained model.

Parameters:
namestr, optional

Name of the pretrained model, by default “mpnet_base_question_answering_squad2”

langstr, optional

Language of the pretrained model, by default “en”

remote_locstr, optional

Optional remote address of the resource, by default None. Will use Spark NLPs repositories otherwise.

Returns:
MPNetForQuestionAnswering

The restored model