sparknlp.annotator.classifier_dl.albert_for_multiple_choice
#
Module Contents#
Classes#
AlbertForMultipleChoice can load ALBERT Models with a multiple choice classification head on top |
- class AlbertForMultipleChoice(classname='com.johnsnowlabs.nlp.annotators.classifier.dl.AlbertForMultipleChoice', java_model=None)[source]#
AlbertForMultipleChoice can load ALBERT Models with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.
Pretrained models can be loaded with
pretrained()
of the companion object:>>> spanClassifier = AlbertForMultipleChoice.pretrained() \ ... .setInputCols(["document_question", "document_context"]) \ ... .setOutputCol("answer")
The default model is
"albert_base_uncased_multiple_choice"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see Import Transformers into Spark NLP 🚀.
Input Annotation types
Output Annotation type
DOCUMENT, DOCUMENT
CHUNK
- Parameters:
- batchSize
Batch size. Large values allows faster processing but requires more memory, by default 8
- caseSensitive
Whether to ignore case in tokens for embeddings matching, by default False
- maxSentenceLength
Max sentence length to process, by default 512
Examples
>>> import sparknlp >>> from sparknlp.base import * >>> from sparknlp.annotator import * >>> from pyspark.ml import Pipeline >>> documentAssembler = MultiDocumentAssembler() \ ... .setInputCols(["question", "context"]) \ ... .setOutputCols(["document_question", "document_context"]) >>> questionAnswering = AlbertForMultipleChoice.pretrained() \ ... .setInputCols(["document_question", "document_context"]) \ ... .setOutputCol("answer") \ ... .setCaseSensitive(False) >>> pipeline = Pipeline().setStages([ ... documentAssembler, ... questionAnswering ... ]) >>> data = spark.createDataFrame([["The Eiffel Tower is located in which country??", "Germany, France, Italy"]]).toDF("question", "context") >>> result = pipeline.fit(data).transform(data) >>> result.select("answer.result").show(truncate=False) +--------------------+ |result | +--------------------+ |[France] | +--------------------+
- setChoicesDelimiter(value)[source]#
Sets delimiter character use to split the choices
- Parameters:
- valuestring
Delimiter character use to split the choices
- static loadSavedModel(folder, spark_session)[source]#
Loads a locally saved model.
- Parameters:
- folderstr
Folder of the saved model
- spark_sessionpyspark.sql.SparkSession
The current SparkSession
- Returns:
- BertForQuestionAnswering
The restored model
- static pretrained(name='albert_base_uncased_multiple_choice', lang='en', remote_loc=None)[source]#
Downloads and loads a pretrained model.
- Parameters:
- namestr, optional
Name of the pretrained model, by default “bert_base_uncased_multiple_choice”
- langstr, optional
Language of the pretrained model, by default “en”
- remote_locstr, optional
Optional remote address of the resource, by default None. Will use Spark NLPs repositories otherwise.
- Returns:
- BertForQuestionAnswering
The restored model