sparknlp.annotator.classifier_dl.bert_for_multiple_choice#

Module Contents#

Classes#

BertForMultipleChoice

BertForMultipleChoice can load BERT Models with a multiple choice classification head on top

class BertForMultipleChoice(classname='com.johnsnowlabs.nlp.annotators.classifier.dl.BertForMultipleChoice', java_model=None)[source]#

BertForMultipleChoice can load BERT Models with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.

Pretrained models can be loaded with pretrained() of the companion object:

>>> spanClassifier = BertForMultipleChoice.pretrained() \
...     .setInputCols(["document_question", "document_context"]) \
...     .setOutputCol("answer")

The default model is "bert_base_uncased_multiple_choice", if no name is provided.

For available pretrained models please see the Models Hub.

To see which models are compatible and how to import them see Import Transformers into Spark NLP 🚀.

Input Annotation types

Output Annotation type

DOCUMENT, DOCUMENT

CHUNK

Parameters:
batchSize

Batch size. Large values allows faster processing but requires more memory, by default 8

caseSensitive

Whether to ignore case in tokens for embeddings matching, by default False

maxSentenceLength

Max sentence length to process, by default 512

Examples

>>> import sparknlp
>>> from sparknlp.base import *
>>> from sparknlp.annotator import *
>>> from pyspark.ml import Pipeline
>>> documentAssembler = MultiDocumentAssembler() \
...     .setInputCols(["question", "context"]) \
...     .setOutputCols(["document_question", "document_context"])
>>> questionAnswering = BertForMultipleChoice.pretrained() \
...     .setInputCols(["document_question", "document_context"]) \
...     .setOutputCol("answer") \
...     .setCaseSensitive(False)
>>> pipeline = Pipeline().setStages([
...     documentAssembler,
...     questionAnswering
... ])
>>> data = spark.createDataFrame([["The Eiffel Tower is located in which country??", "Germany, France, Italy"]]).toDF("question", "context")
>>> result = pipeline.fit(data).transform(data)
>>> result.select("answer.result").show(truncate=False)
+--------------------+
|result              |
+--------------------+
|[France]             |
+--------------------+
name = 'BertForMultipleChoice'[source]#
inputAnnotatorTypes[source]#
outputAnnotatorType = 'chunk'[source]#
choicesDelimiter[source]#
setChoicesDelimiter(value)[source]#

Sets delimiter character use to split the choices

Parameters:
valuestring

Delimiter character use to split the choices

static loadSavedModel(folder, spark_session)[source]#

Loads a locally saved model.

Parameters:
folderstr

Folder of the saved model

spark_sessionpyspark.sql.SparkSession

The current SparkSession

Returns:
BertForMultipleChoice

The restored model

static pretrained(name='bert_base_uncased_multiple_choice', lang='en', remote_loc=None)[source]#

Downloads and loads a pretrained model.

Parameters:
namestr, optional

Name of the pretrained model, by default “bert_base_uncased_multiple_choice”

langstr, optional

Language of the pretrained model, by default “en”

remote_locstr, optional

Optional remote address of the resource, by default None. Will use Spark NLPs repositories otherwise.

Returns:
BertForMultipleChoice

The restored model