`sparknlp.annotator.classifier_dl.bert_for_multiple_choice`#

Module Contents#

Classes#

BertForMultipleChoice

BertForMultipleChoice can load BERT Models with a multiple choice classification head on top

class BertForMultipleChoice(classname='com.johnsnowlabs.nlp.annotators.classifier.dl.BertForMultipleChoice', java_model=None)[source]#

BertForMultipleChoice can load BERT Models with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.

Pretrained models can be loaded with pretrained() of the companion object:

>>> spanClassifier = BertForMultipleChoice.pretrained() \
...     .setInputCols(["document_question", "document_context"]) \
...     .setOutputCol("answer")

The default model is "bert_base_uncased_multiple_choice", if no name is provided.

For available pretrained models please see the Models Hub.

To see which models are compatible and how to import them see Import Transformers into Spark NLP 🚀.

Input Annotation types	Output Annotation type
`DOCUMENT, DOCUMENT`	`CHUNK`

Parameters:

batchSize: Batch size. Large values allows faster processing but requires more memory, by default 8
caseSensitive: Whether to ignore case in tokens for embeddings matching, by default False
maxSentenceLength: Max sentence length to process, by default 512

Examples

>>> import sparknlp
>>> from sparknlp.base import *
>>> from sparknlp.annotator import *
>>> from pyspark.ml import Pipeline
>>> documentAssembler = MultiDocumentAssembler() \
...     .setInputCols(["question", "context"]) \
...     .setOutputCols(["document_question", "document_context"])
>>> questionAnswering = BertForMultipleChoice.pretrained() \
...     .setInputCols(["document_question", "document_context"]) \
...     .setOutputCol("answer") \
...     .setCaseSensitive(False)
>>> pipeline = Pipeline().setStages([
...     documentAssembler,
...     questionAnswering
... ])
>>> data = spark.createDataFrame([["The Eiffel Tower is located in which country??", "Germany, France, Italy"]]).toDF("question", "context")
>>> result = pipeline.fit(data).transform(data)
>>> result.select("answer.result").show(truncate=False)
+--------------------+
|result              |
+--------------------+
|[France]             |
+--------------------+

name = 'BertForMultipleChoice'[source]#

inputAnnotatorTypes[source]#

outputAnnotatorType = 'chunk'[source]#

choicesDelimiter[source]#

setChoicesDelimiter(value)[source]#

Sets delimiter character use to split the choices

Parameters:

valuestring: Delimiter character use to split the choices

static loadSavedModel(folder, spark_session)[source]#

Loads a locally saved model.

Parameters:

folderstr: Folder of the saved model
spark_sessionpyspark.sql.SparkSession: The current SparkSession

Returns:

BertForMultipleChoice: The restored model

static pretrained(name='bert_base_uncased_multiple_choice', lang='en', remote_loc=None)[source]#

Downloads and loads a pretrained model.

Parameters:

namestr, optional: Name of the pretrained model, by default “bert_base_uncased_multiple_choice”
langstr, optional: Language of the pretrained model, by default “en”
remote_locstr, optional: Optional remote address of the resource, by default None. Will use Spark NLPs repositories otherwise.

Returns:

BertForMultipleChoice: The restored model

sparknlp.annotator.classifier_dl.bert_for_multiple_choice#

Module Contents#

Classes#

`sparknlp.annotator.classifier_dl.bert_for_multiple_choice`#