`sparknlp.annotator.classifier_dl.roberta_for_question_answering`#

Module Contents#

Classes#

RoBertaForQuestionAnswering

RoBertaForQuestionAnswering can load RoBERTa Models with a span classification head on top for extractive

class RoBertaForQuestionAnswering(classname='com.johnsnowlabs.nlp.annotators.classifier.dl.RoBertaForQuestionAnswering', java_model=None)[source]#

RoBertaForQuestionAnswering can load RoBERTa Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).

Pretrained models can be loaded with pretrained() of the companion object:

>>> spanClassifier = RoBertaForQuestionAnswering.pretrained() \
...     .setInputCols(["document_question", "document_context"]) \
...     .setOutputCol("answer")

The default model is "roberta_base_qa_squad2", if no name is provided.

For available pretrained models please see the Models Hub.

To see which models are compatible and how to import them see Import Transformers into Spark NLP 🚀.

Input Annotation types	Output Annotation type
`DOCUMENT, DOCUMENT`	`CHUNK`

Parameters:

batchSize: Batch size. Large values allows faster processing but requires more memory, by default 8
caseSensitive: Whether to ignore case in tokens for embeddings matching, by default False
configProtoBytes: ConfigProto from tensorflow, serialized into byte array.
maxSentenceLength: Max sentence length to process, by default 128

Examples

>>> import sparknlp
>>> from sparknlp.base import *
>>> from sparknlp.annotator import *
>>> from pyspark.ml import Pipeline
>>> documentAssembler = MultiDocumentAssembler() \
...     .setInputCols(["question", "context"]) \
...     .setOutputCol(["document_question", "document_context"])
>>> spanClassifier = RoBertaForQuestionAnswering.pretrained() \
...     .setInputCols(["document_question", "document_context"]) \
...     .setOutputCol("answer") \
...     .setCaseSensitive(False)
>>> pipeline = Pipeline().setStages([
...     documentAssembler,
...     spanClassifier
... ])
>>> data = spark.createDataFrame([["What's my name?", "My name is Clara and I live in Berkeley."]]).toDF("question", "context")
>>> result = pipeline.fit(data).transform(data)
>>> result.select("answer.result").show(truncate=False)
+--------------------+
|result              |
+--------------------+
|[Clara]             |
+--------------------+

name = 'RoBertaForQuestionAnswering'[source]#

inputAnnotatorTypes[source]#

outputAnnotatorType = 'chunk'[source]#

configProtoBytes[source]#

coalesceSentences[source]#

setConfigProtoBytes(b)[source]#

Sets configProto from tensorflow, serialized into byte array.

Parameters:

bList[int]: ConfigProto from tensorflow, serialized into byte array

static loadSavedModel(folder, spark_session)[source]#

Loads a locally saved model.

Parameters:

folderstr: Folder of the saved model
spark_sessionpyspark.sql.SparkSession: The current SparkSession

Returns:

RoBertaForQuestionAnswering: The restored model

static pretrained(name='roberta_base_qa_squad2', lang='en', remote_loc=None)[source]#

Downloads and loads a pretrained model.

Parameters:

namestr, optional: Name of the pretrained model, by default “roberta_base_qa_squad2”
langstr, optional: Language of the pretrained model, by default “en”
remote_locstr, optional: Optional remote address of the resource, by default None. Will use Spark NLPs repositories otherwise.

Returns:

RoBertaForQuestionAnswering: The restored model

sparknlp.annotator.classifier_dl.roberta_for_question_answering#

Module Contents#

Classes#

`sparknlp.annotator.classifier_dl.roberta_for_question_answering`#