package dl
- Alphabetic
- Public
- All
Type Members
-
class
AlbertForQuestionAnswering extends AnnotatorModel[AlbertForQuestionAnswering] with HasBatchedAnnotate[AlbertForQuestionAnswering] with WriteTensorflowModel with WriteOnnxModel with WriteOpenvinoModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasEngine
AlbertForQuestionAnswering can load ALBERT Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
AlbertForQuestionAnswering can load ALBERT Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
Pretrained models can be loaded with
pretrained
of the companion object:val spanClassifier = AlbertForQuestionAnswering.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer")
The default model is
"albert_base_qa_squad2"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see AlbertForQuestionAnsweringTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val document = new MultiDocumentAssembler() .setInputCols("question", "context") .setOutputCols("document_question", "document_context") val questionAnswering = AlbertForQuestionAnswering.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer") .setCaseSensitive(false) val pipeline = new Pipeline().setStages(Array( document, questionAnswering )) val data = Seq("What's my name?", "My name is Clara and I live in Berkeley.").toDF("question", "context") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +---------------------+ |result | +---------------------+ |[Clara] | ++--------------------+
- See also
AlbertForSequenceClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
AlbertForSequenceClassification extends AnnotatorModel[AlbertForSequenceClassification] with HasBatchedAnnotate[AlbertForSequenceClassification] with WriteTensorflowModel with WriteOnnxModel with WriteOpenvinoModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasClassifierActivationProperties with HasEngine
AlbertForSequenceClassification can load ALBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
AlbertForSequenceClassification can load ALBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = AlbertForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"albert_base_sequence_classifier_imdb"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see AlbertForSequenceClassification.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val sequenceClassifier = AlbertForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, sequenceClassifier )) val data = Seq("I loved this movie when I was a child.", "It was pretty boring.").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------+ |result| +------+ |[pos] | |[neg] | +------+
- See also
AlbertForSequenceClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
AlbertForTokenClassification extends AnnotatorModel[AlbertForTokenClassification] with HasBatchedAnnotate[AlbertForTokenClassification] with WriteTensorflowModel with WriteOnnxModel with WriteOpenvinoModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasEngine
AlbertForTokenClassification can load ALBERT Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g.
AlbertForTokenClassification can load ALBERT Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val tokenClassifier = AlbertForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"albert_base_token_classifier_conll03"
, if no name is provided.For available pretrained models please see the Models Hub.
and the AlbertForTokenClassificationTestSpec. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val tokenClassifier = AlbertForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, tokenClassifier )) val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------------------------------------------------------------------------------------+ |result | +------------------------------------------------------------------------------------+ |[B-PER, I-PER, O, O, O, B-LOC, O, O, O, B-LOC, O, O, O, O, B-PER, O, O, O, O, B-LOC]| +------------------------------------------------------------------------------------+
- See also
AlbertForTokenClassification for token-level classification
Annotators Main Page for a list of transformer based classifiers
- class AlbertForZeroShotClassification extends AnnotatorModel[AlbertForZeroShotClassification] with HasBatchedAnnotate[AlbertForZeroShotClassification] with WriteTensorflowModel with WriteOnnxModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasClassifierActivationProperties with HasEngine with HasCandidateLabelsProperties
-
class
BartForZeroShotClassification extends AnnotatorModel[BartForZeroShotClassification] with HasBatchedAnnotate[BartForZeroShotClassification] with WriteTensorflowModel with WriteOnnxModel with WriteOpenvinoModel with HasCaseSensitiveProperties with HasClassifierActivationProperties with HasEngine with HasCandidateLabelsProperties
BartForZeroShotClassification using a
ModelForSequenceClassification
trained on NLI (natural language inference) tasks.BartForZeroShotClassification using a
ModelForSequenceClassification
trained on NLI (natural language inference) tasks. Equivalent ofBartForZeroShotClassification
models, but these models don't require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it's slower but it is much more flexible.Note that the model will loop through all provided labels. So the more labels you have, the longer this process will take.
Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis pair and passed to the pretrained model.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = BartForZeroShotClassification .pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"bart_large_zero_shot_classifier_mnli"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val sequenceClassifier = BartForZeroShotClassification .pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, sequenceClassifier )) val data = Seq("I loved this movie when I was a child.", "It was pretty boring.").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------+ |result| +------+ |[pos] | |[neg] | +------+
- See also
BartForZeroShotClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
BertForMultipleChoice extends AnnotatorModel[BertForMultipleChoice] with HasBatchedAnnotate[BertForMultipleChoice] with WriteOnnxModel with WriteOpenvinoModel with HasCaseSensitiveProperties with HasEngine
BertForMultipleChoice can load BERT Models with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g.
BertForMultipleChoice can load BERT Models with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val spanClassifier = BertForMultipleChoice.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer")
The default model is
"bert_base_uncased_multiple_choice"
, if no name is provided.For available pretrained models please see the Models Hub.
Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see BertForMultipleChoiceTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val document = new MultiDocumentAssembler() .setInputCols("question", "context") .setOutputCols("document_question", "document_context") val questionAnswering = BertForMultipleChoice.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer") .setCaseSensitive(false) val pipeline = new Pipeline().setStages(Array( document, questionAnswering )) val data = Seq("The Eiffel Tower is located in which country?", "Germany, France, Italy").toDF("question", "context") val result = pipeline.fit(data).transform(data) result.select("answer.result").show(false) +---------------------+ |result | +---------------------+ |[France] | ++--------------------+
- See also
BertForQuestionAnswering for Question Answering tasks
Annotators Main Page for a list of transformer based classifiers
-
class
BertForQuestionAnswering extends AnnotatorModel[BertForQuestionAnswering] with HasBatchedAnnotate[BertForQuestionAnswering] with WriteTensorflowModel with WriteOnnxModel with WriteOpenvinoModel with HasCaseSensitiveProperties with HasEngine
BertForQuestionAnswering can load Bert Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
BertForQuestionAnswering can load Bert Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
Pretrained models can be loaded with
pretrained
of the companion object:val spanClassifier = BertForQuestionAnswering.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer")
The default model is
"bert_base_cased_qa_squad2"
, if no name is provided.For available pretrained models please see the Models Hub.
Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see BertForQuestionAnsweringTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val document = new MultiDocumentAssembler() .setInputCols("question", "context") .setOutputCols("document_question", "document_context") val questionAnswering = BertForQuestionAnswering.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( document, questionAnswering )) val data = Seq("What's my name?", "My name is Clara and I live in Berkeley.").toDF("question", "context") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +---------------------+ |result | +---------------------+ |[Clara] | ++--------------------+
- See also
BertForSequenceClassification for span-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
BertForSequenceClassification extends AnnotatorModel[BertForSequenceClassification] with HasBatchedAnnotate[BertForSequenceClassification] with WriteTensorflowModel with WriteOnnxModel with WriteOpenvinoModel with HasCaseSensitiveProperties with HasClassifierActivationProperties with HasEngine
BertForSequenceClassification can load Bert Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
BertForSequenceClassification can load Bert Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = BertForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"bert_base_sequence_classifier_imdb"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see BertForSequenceClassificationTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val sequenceClassifier = BertForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, sequenceClassifier )) val data = Seq("I loved this movie when I was a child.", "It was pretty boring.").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------+ |result| +------+ |[pos] | |[neg] | +------+
- See also
BertForSequenceClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
BertForTokenClassification extends AnnotatorModel[BertForTokenClassification] with HasBatchedAnnotate[BertForTokenClassification] with WriteTensorflowModel with WriteOnnxModel with WriteOpenvinoModel with HasCaseSensitiveProperties with HasEngine
BertForTokenClassification can load Bert Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g.
BertForTokenClassification can load Bert Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val tokenClassifier = BertForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"bert_base_token_classifier_conll03"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see BertForTokenClassificationTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val tokenClassifier = BertForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, tokenClassifier )) val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------------------------------------------------------------------------------------+ |result | +------------------------------------------------------------------------------------+ |[B-PER, I-PER, O, O, O, B-LOC, O, O, O, B-LOC, O, O, O, O, B-PER, O, O, O, O, B-LOC]| +------------------------------------------------------------------------------------+
- See also
BertForTokenClassification for token-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
BertForZeroShotClassification extends AnnotatorModel[BertForZeroShotClassification] with HasBatchedAnnotate[BertForZeroShotClassification] with WriteTensorflowModel with WriteOnnxModel with WriteOpenvinoModel with HasCaseSensitiveProperties with HasClassifierActivationProperties with HasEngine with HasCandidateLabelsProperties
BertForZeroShotClassification using a
ModelForSequenceClassification
trained on NLI (natural language inference) tasks.BertForZeroShotClassification using a
ModelForSequenceClassification
trained on NLI (natural language inference) tasks. Equivalent ofBertForSequenceClassification
models, but these models don't require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it's slower but it is much more flexible.Note that the model will loop through all provided labels. So the more labels you have, the longer this process will take.
Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis pair and passed to the pretrained model.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = BertForZeroShotClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"bert_zero_shot_classifier_mnli"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val sequenceClassifier = BertForZeroShotClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, sequenceClassifier )) val data = Seq("I loved this movie when I was a child.", "It was pretty boring.").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------+ |result| +------+ |[pos] | |[neg] | +------+
- See also
BertForZeroShotClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
CamemBertForQuestionAnswering extends AnnotatorModel[CamemBertForQuestionAnswering] with HasBatchedAnnotate[CamemBertForQuestionAnswering] with WriteTensorflowModel with WriteOnnxModel with WriteOpenvinoModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasEngine
CamemBertForQuestionAnswering can load CamemBERT Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
CamemBertForQuestionAnswering can load CamemBERT Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
Pretrained models can be loaded with
pretrained
of the companion object:val spanClassifier = CamemBertForQuestionAnswering.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer")
The default model is
"camembert_base_qa_fquad"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see CamemBertForQuestionAnsweringTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val document = new MultiDocumentAssembler() .setInputCols("question", "context") .setOutputCols("document_question", "document_context") val questionAnswering = CamemBertForQuestionAnswering.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( document, questionAnswering )) val data = Seq("What's my name?", "My name is Clara and I live in Berkeley.").toDF("question", "context") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +---------------------+ |result | +---------------------+ |[Clara] | ++--------------------+
- See also
CamemBertForQuestionAnswering for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
CamemBertForSequenceClassification extends AnnotatorModel[CamemBertForSequenceClassification] with HasBatchedAnnotate[CamemBertForSequenceClassification] with WriteTensorflowModel with WriteOnnxModel with WriteOpenvinoModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasClassifierActivationProperties with HasEngine
CamemBertForSequenceClassification can load CamemBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
CamemBertForSequenceClassification can load CamemBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = CamemBertForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
camembert_base_sequence_classifier_allocine"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see CamemBertForSequenceClassification.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val sequenceClassifier = CamemBertForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, sequenceClassifier )) val data = Seq("j'ai adoré ce film lorsque j'étais enfant.", "Je déteste ça.").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------+ |result| +------+ |[pos] | |[neg] | +------+
- See also
CamemBertForSequenceClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
CamemBertForTokenClassification extends AnnotatorModel[CamemBertForTokenClassification] with HasBatchedAnnotate[CamemBertForTokenClassification] with WriteTensorflowModel with WriteOnnxModel with WriteOpenvinoModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasEngine
CamemBertForTokenClassification can load CamemBERT Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g.
CamemBertForTokenClassification can load CamemBERT Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val tokenClassifier = CamemBertForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"camembert_base_token_classifier_wikiner"
, if no name is provided.For available pretrained models please see the Models Hub.
and the CamemBertForTokenClassificationTestSpec. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val tokenClassifier = CamemBertForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, tokenClassifier )) val data = Seq("george washington est allé à washington").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------------------------------+ |result | +------------------------------+ |[I-PER, I-PER, O, O, O, I-LOC]| +------------------------------+
- See also
CamemBertForTokenClassification for token-level classification
Annotators Main Page for a list of transformer based classifiers
- class CamemBertForZeroShotClassification extends AnnotatorModel[CamemBertForZeroShotClassification] with HasBatchedAnnotate[CamemBertForZeroShotClassification] with WriteTensorflowModel with WriteOnnxModel with WriteOpenvinoModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasClassifierActivationProperties with HasEngine with HasCandidateLabelsProperties
-
class
ClassifierDLApproach extends AnnotatorApproach[ClassifierDLModel] with ParamsAndFeaturesWritable with ClassifierEncoder
Trains a ClassifierDL for generic Multi-class Text Classification.
Trains a ClassifierDL for generic Multi-class Text Classification.
ClassifierDL uses the state-of-the-art Universal Sentence Encoder as an input for text classifications. The ClassifierDL annotator uses a deep learning model (DNNs) we have built inside TensorFlow and supports up to 100 classes.
For instantiated/pretrained models, see ClassifierDLModel.
Notes:
- This annotator accepts a label column of a single item in either type of String, Int, Float, or Double.
- UniversalSentenceEncoder,
BertSentenceEmbeddings, or
SentenceEmbeddings can be used for
the
inputCol
.
Setting a test dataset to monitor model metrics can be done with
.setTestDataset
. The method expects a path to a parquet file containing a dataframe that has the same required columns as the training dataframe. The pre-processing steps for the training dataframe should also be applied to the test dataframe. The following example will show how to create the test dataset:val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val embeddings = UniversalSentenceEncoder.pretrained() .setInputCols("document") .setOutputCol("sentence_embeddings") val preProcessingPipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) val Array(train, test) = data.randomSplit(Array(0.8, 0.2)) preProcessingPipeline .fit(test) .transform(test) .write .mode("overwrite") .parquet("test_data") val classifier = new ClassifierDLApproach() .setInputCols("sentence_embeddings") .setOutputCol("category") .setLabelColumn("label") .setTestDataset("test_data")
For extended examples of usage, see the Examples [1] [2] and the ClassifierDLTestSpec.
Example
In this example, the training data
"sentiment.csv"
has the form oftext,label This movie is the best movie I have wached ever! In my opinion this movie can win an award.,0 This was a terrible movie! The acting was bad really bad!,1 ...
Then traning can be done like so:
import com.johnsnowlabs.nlp.base.DocumentAssembler import com.johnsnowlabs.nlp.embeddings.UniversalSentenceEncoder import com.johnsnowlabs.nlp.annotators.classifier.dl.ClassifierDLApproach import org.apache.spark.ml.Pipeline val smallCorpus = spark.read.option("header","true").csv("src/test/resources/classifier/sentiment.csv") val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val useEmbeddings = UniversalSentenceEncoder.pretrained() .setInputCols("document") .setOutputCol("sentence_embeddings") val docClassifier = new ClassifierDLApproach() .setInputCols("sentence_embeddings") .setOutputCol("category") .setLabelColumn("label") .setBatchSize(64) .setMaxEpochs(20) .setLr(5e-3f) .setDropout(0.5f) val pipeline = new Pipeline() .setStages( Array( documentAssembler, useEmbeddings, docClassifier ) ) val pipelineModel = pipeline.fit(smallCorpus)
- See also
MultiClassifierDLApproach for multi-class classification
SentimentDLApproach for sentiment analysis
-
class
ClassifierDLModel extends AnnotatorModel[ClassifierDLModel] with HasSimpleAnnotate[ClassifierDLModel] with WriteTensorflowModel with HasStorageRef with ParamsAndFeaturesWritable with HasEngine
ClassifierDL for generic Multi-class Text Classification.
ClassifierDL for generic Multi-class Text Classification.
ClassifierDL uses the state-of-the-art Universal Sentence Encoder as an input for text classifications. The ClassifierDL annotator uses a deep learning model (DNNs) we have built inside TensorFlow and supports up to 100 classes.
This is the instantiated model of the ClassifierDLApproach. For training your own model, please see the documentation of that class.
Pretrained models can be loaded with
pretrained
of the companion object:val classifierDL = ClassifierDLModel.pretrained() .setInputCols("sentence_embeddings") .setOutputCol("classification")
The default model is
"classifierdl_use_trec6"
, if no name is provided. It uses embeddings from the UniversalSentenceEncoder and is trained on the TREC-6 dataset. For available pretrained models please see the Models Hub.For extended examples of usage, see the Examples and the ClassifierDLTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base.DocumentAssembler import com.johnsnowlabs.nlp.annotator.SentenceDetector import com.johnsnowlabs.nlp.annotators.classifier.dl.ClassifierDLModel import com.johnsnowlabs.nlp.embeddings.UniversalSentenceEncoder import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val sentence = new SentenceDetector() .setInputCols("document") .setOutputCol("sentence") val useEmbeddings = UniversalSentenceEncoder.pretrained() .setInputCols("document") .setOutputCol("sentence_embeddings") val sarcasmDL = ClassifierDLModel.pretrained("classifierdl_use_sarcasm") .setInputCols("sentence_embeddings") .setOutputCol("sarcasm") val pipeline = new Pipeline() .setStages(Array( documentAssembler, sentence, useEmbeddings, sarcasmDL )) val data = Seq( "I'm ready!", "If I could put into words how much I love waking up at 6 am on Mondays I would." ).toDF("text") val result = pipeline.fit(data).transform(data) result.selectExpr("explode(arrays_zip(sentence, sarcasm)) as out") .selectExpr("out.sentence.result as sentence", "out.sarcasm.result as sarcasm") .show(false) +-------------------------------------------------------------------------------+-------+ |sentence |sarcasm| +-------------------------------------------------------------------------------+-------+ |I'm ready! |normal | |If I could put into words how much I love waking up at 6 am on Mondays I would.|sarcasm| +-------------------------------------------------------------------------------+-------+
- See also
MultiClassifierDLModel for multi-class classification
SentimentDLModel for sentiment analysis
- trait ClassifierEncoder extends EvaluationDLParams
- trait ClassifierMetrics extends Logging
-
class
DeBertaForQuestionAnswering extends AnnotatorModel[DeBertaForQuestionAnswering] with HasBatchedAnnotate[DeBertaForQuestionAnswering] with WriteTensorflowModel with WriteOnnxModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasEngine
DeBertaForQuestionAnswering can load DeBERTa Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
DeBertaForQuestionAnswering can load DeBERTa Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
Pretrained models can be loaded with
pretrained
of the companion object:val spanClassifier = DeBertaForQuestionAnswering.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer")
The default model is
"deberta_v3_xsmall_qa_squad2"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see DeBertaForQuestionAnsweringTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val document = new MultiDocumentAssembler() .setInputCols("question", "context") .setOutputCols("document_question", "document_context") val questionAnswering = DeBertaForQuestionAnswering.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( document, questionAnswering )) val data = Seq("What's my name?", "My name is Clara and I live in Berkeley.").toDF("question", "context") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +---------------------+ |result | +---------------------+ |[Clara] | ++--------------------+
- See also
DeBertaForQuestionAnswering for span-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
DeBertaForSequenceClassification extends AnnotatorModel[DeBertaForSequenceClassification] with HasBatchedAnnotate[DeBertaForSequenceClassification] with WriteOnnxModel with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasClassifierActivationProperties with HasEngine
DeBertaForSequenceClassification can load DeBerta v2 & v3 Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
DeBertaForSequenceClassification can load DeBerta v2 & v3 Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = DeBertaForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"deberta_v3_xsmall_sequence_classifier_imdb"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see DeBertaForSequenceClassification.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val sequenceClassifier = DeBertaForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, sequenceClassifier )) val data = Seq("I loved this movie when I was a child.", "It was pretty boring.").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------+ |result| +------+ |[pos] | |[neg] | +------+
- See also
DeBertaForSequenceClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
DeBertaForTokenClassification extends AnnotatorModel[DeBertaForTokenClassification] with HasBatchedAnnotate[DeBertaForTokenClassification] with WriteTensorflowModel with WriteOnnxModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasEngine
DeBertaForTokenClassification can load DeBERTA Models v2 and v3 with a token classification head on top (a linear layer on top of the hidden-states output) e.g.
DeBertaForTokenClassification can load DeBERTA Models v2 and v3 with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val tokenClassifier = DeBertaForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"deberta_v3_xsmall_token_classifier_conll03"
, if no name is provided.For available pretrained models please see the Models Hub.
and the DeBertaForTokenClassificationTestSpec. Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val tokenClassifier = DeBertaForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, tokenClassifier )) val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------------------------------------------------------------------------------------+ |result | +------------------------------------------------------------------------------------+ |[B-PER, I-PER, O, O, O, B-LOC, O, O, O, B-LOC, O, O, O, O, B-PER, O, O, O, O, B-LOC]| +------------------------------------------------------------------------------------+
- See also
DeBertaForTokenClassification for token-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
DeBertaForZeroShotClassification extends AnnotatorModel[DeBertaForZeroShotClassification] with HasBatchedAnnotate[DeBertaForZeroShotClassification] with WriteTensorflowModel with WriteOnnxModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasClassifierActivationProperties with HasEngine with HasCandidateLabelsProperties
DeBertaForZeroShotClassification using a
ModelForSequenceClassification
trained on NLI (natural language inference) tasks.DeBertaForZeroShotClassification using a
ModelForSequenceClassification
trained on NLI (natural language inference) tasks. Equivalent ofDeBertaForZeroShotClassification
models, but these models don't require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it's slower but it is much more flexible.Note that the model will loop through all provided labels. So the more labels you have, the longer this process will take.
Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis pair and passed to the pretrained model.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = DeBertaForZeroShotClassification .pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"deberta_base_zero_shot_classifier_mnli_anli_v3"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val sequenceClassifier = DeBertaForZeroShotClassification .pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, sequenceClassifier )) val data = Seq("I loved this movie when I was a child.", "It was pretty boring.").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------+ |result| +------+ |[pos] | |[neg] | +------+
- See also
DeBertaForZeroShotClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
DistilBertForQuestionAnswering extends AnnotatorModel[DistilBertForQuestionAnswering] with HasBatchedAnnotate[DistilBertForQuestionAnswering] with WriteTensorflowModel with WriteOnnxModel with HasCaseSensitiveProperties with HasEngine
DistilBertForQuestionAnswering can load DistilBert Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
DistilBertForQuestionAnswering can load DistilBert Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
Pretrained models can be loaded with
pretrained
of the companion object:val spanClassifier = DistilBertForQuestionAnswering.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer")
The default model is
"distilbert_base_cased_qa_squad2"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see DistilBertForSequenceClassificationTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val document = new MultiDocumentAssembler() .setInputCols("question", "context") .setOutputCols("document_question", "document_context") val questionAnswering = DistilBertForQuestionAnswering.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( document, questionAnswering )) val data = Seq("What's my name?", "My name is Clara and I live in Berkeley.").toDF("question", "context") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +---------------------+ |result | +---------------------+ |[Clara] | ++--------------------+
- See also
DistilBertForSequenceClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
DistilBertForSequenceClassification extends AnnotatorModel[DistilBertForSequenceClassification] with HasBatchedAnnotate[DistilBertForSequenceClassification] with WriteTensorflowModel with WriteOnnxModel with HasCaseSensitiveProperties with HasClassifierActivationProperties with HasEngine
DistilBertForSequenceClassification can load DistilBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
DistilBertForSequenceClassification can load DistilBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = DistilBertForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"distilbert_base_sequence_classifier_imdb"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see DistilBertForSequenceClassificationTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val sequenceClassifier = DistilBertForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, sequenceClassifier )) val data = Seq("I loved this movie when I was a child.", "It was pretty boring.").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------+ |result| +------+ |[pos] | |[neg] | +------+
- See also
DistilBertForSequenceClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
DistilBertForTokenClassification extends AnnotatorModel[DistilBertForTokenClassification] with HasBatchedAnnotate[DistilBertForTokenClassification] with WriteTensorflowModel with WriteOnnxModel with HasCaseSensitiveProperties with HasEngine
DistilBertForTokenClassification can load Bert Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g.
DistilBertForTokenClassification can load Bert Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val tokenClassifier = DistilBertForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"distilbert_base_token_classifier_conll03"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see DistilBertForTokenClassificationTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val tokenClassifier = DistilBertForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, tokenClassifier )) val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------------------------------------------------------------------------------------+ |result | +------------------------------------------------------------------------------------+ |[B-PER, I-PER, O, O, O, B-LOC, O, O, O, B-LOC, O, O, O, O, B-PER, O, O, O, O, B-LOC]| +------------------------------------------------------------------------------------+
- See also
DistilBertForTokenClassification for token-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
DistilBertForZeroShotClassification extends AnnotatorModel[DistilBertForZeroShotClassification] with HasBatchedAnnotate[DistilBertForZeroShotClassification] with WriteTensorflowModel with WriteOnnxModel with HasCaseSensitiveProperties with HasClassifierActivationProperties with HasEngine with HasCandidateLabelsProperties
DistilBertForZeroShotClassification using a
ModelForSequenceClassification
trained on NLI (natural language inference) tasks.DistilBertForZeroShotClassification using a
ModelForSequenceClassification
trained on NLI (natural language inference) tasks. Equivalent ofDistilBertForZeroShotClassification
models, but these models don't require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it's slower but it is much more flexible.Note that the model will loop through all provided labels. So the more labels you have, the longer this process will take.
Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis pair and passed to the pretrained model.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = DistilBertForZeroShotClassification .pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"distilbert_base_zero_shot_classifier_uncased_mnli"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val sequenceClassifier = DistilBertForZeroShotClassification .pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, sequenceClassifier )) val data = Seq("I loved this movie when I was a child.", "It was pretty boring.").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------+ |result| +------+ |[pos] | |[neg] | +------+
- See also
DistilBertForZeroShotClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
LongformerForQuestionAnswering extends AnnotatorModel[LongformerForQuestionAnswering] with HasBatchedAnnotate[LongformerForQuestionAnswering] with WriteTensorflowModel with HasCaseSensitiveProperties with HasEngine
LongformerForQuestionAnswering can load Longformer Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
LongformerForQuestionAnswering can load Longformer Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
Pretrained models can be loaded with
pretrained
of the companion object:val spanClassifier = LongformerForQuestionAnswering.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer")
The default model is
"longformer_base_base_qa_squad2"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see LongformerForQuestionAnsweringTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val document = new MultiDocumentAssembler() .setInputCols("question", "context") .setOutputCols("document_question", "document_context") val questionAnswering = LongformerForQuestionAnswering.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( document, questionAnswering )) val data = Seq("What's my name?", "My name is Clara and I live in Berkeley.").toDF("question", "context") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +---------------------+ |result | +---------------------+ |[Clara] | ++--------------------+
- See also
LongformerForSequenceClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
LongformerForSequenceClassification extends AnnotatorModel[LongformerForSequenceClassification] with HasBatchedAnnotate[LongformerForSequenceClassification] with WriteTensorflowModel with HasCaseSensitiveProperties with HasClassifierActivationProperties with HasEngine
LongformerForSequenceClassification can load Longformer Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
LongformerForSequenceClassification can load Longformer Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = LongformerForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"longformer_base_sequence_classifier_imdb"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see LongformerForSequenceClassification.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val sequenceClassifier = LongformerForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, sequenceClassifier )) val data = Seq("I loved this movie when I was a child.", "It was pretty boring.").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------+ |result| +------+ |[pos] | |[neg] | +------+
- See also
LongformerForSequenceClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
LongformerForTokenClassification extends AnnotatorModel[LongformerForTokenClassification] with HasBatchedAnnotate[LongformerForTokenClassification] with WriteTensorflowModel with HasCaseSensitiveProperties with HasEngine
LongformerForTokenClassification can load Longformer Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g.
LongformerForTokenClassification can load Longformer Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val tokenClassifier = LongformerForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"longformer_base_token_classifier_conll03"
, if no name is provided.For available pretrained models please see the Models Hub.
and the LongformerForTokenClassificationTestSpec. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val tokenClassifier = LongformerForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, tokenClassifier )) val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------------------------------------------------------------------------------------+ |result | +------------------------------------------------------------------------------------+ |[B-PER, I-PER, O, O, O, B-LOC, O, O, O, B-LOC, O, O, O, O, B-PER, O, O, O, O, B-LOC]| +------------------------------------------------------------------------------------+
- See also
LongformerForTokenClassification for token-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
MPNetForQuestionAnswering extends AnnotatorModel[MPNetForQuestionAnswering] with HasBatchedAnnotate[MPNetForQuestionAnswering] with WriteOnnxModel with HasCaseSensitiveProperties with HasEngine
MPNetForQuestionAnswering can load MPNet Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
MPNetForQuestionAnswering can load MPNet Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
Pretrained models can be loaded with
pretrained
of the companion object:val spanClassifier = MPNetForQuestionAnswering.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer")
The default model is
"mpnet_base_question_answering_squad2"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see MPNetForQuestionAnsweringTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val document = new MultiDocumentAssembler() .setInputCols("question", "context") .setOutputCols("document_question", "document_context") val questionAnswering = MPNetForQuestionAnswering.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( document, questionAnswering )) val data = Seq("What's my name?", "My name is Clara and I live in Berkeley.").toDF("question", "context") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +---------------------+ |result | +---------------------+ |[Clara] | ++--------------------+
- See also
MPNetForSequenceClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
MPNetForSequenceClassification extends AnnotatorModel[MPNetForSequenceClassification] with HasBatchedAnnotate[MPNetForSequenceClassification] with WriteOnnxModel with HasCaseSensitiveProperties with HasClassifierActivationProperties with HasEngine
MPNetForSequenceClassification can load MPNet Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
MPNetForSequenceClassification can load MPNet Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Note that currently, only SetFit models can be imported.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = MPNetForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"mpnet_sequence_classifier_ukr_message"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see MPNetForSequenceClassificationTestSpec.
Example
import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline import spark.implicits._ val document = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols(Array("document")) .setOutputCol("token") val sequenceClassifier = MPNetForSequenceClassification .pretrained() .setInputCols(Array("document", "token")) .setOutputCol("label") val texts = Seq( "I love driving my car.", "The next bus will arrive in 20 minutes.", "pineapple on pizza is the worst 🤮") val data = texts.toDF("text") val pipeline = new Pipeline().setStages(Array(document, tokenizer, sequenceClassifier)) val pipelineModel = pipeline.fit(data) val results = pipelineModel.transform(data) results.select("label.result").show() +--------------------+ | result| +--------------------+ | [TRANSPORT/CAR]| |[TRANSPORT/MOVEMENT]| | [FOOD]| +--------------------+
- See also
MPNetForSequenceClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
MPNetForTokenClassification extends AnnotatorModel[MPNetForTokenClassification] with HasBatchedAnnotate[MPNetForTokenClassification] with WriteOnnxModel with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasEngine
MPNetForTokenClassification can load MPNet Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g.
MPNetForTokenClassification can load MPNet Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val tokenClassifier = MPNetForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"mpnet_base_token_classifier"
, if no name is provided.For available pretrained models please see the Models Hub.
and the MPNetForTokenClassificationTestSpec. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val tokenClassifier = MPNetForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, tokenClassifier )) val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------------------------------------------------------------------------------------+ |result | +------------------------------------------------------------------------------------+ |[B-PER, I-PER, O, O, O, B-LOC, O, O, O, B-LOC, O, O, O, O, B-PER, O, O, O, O, B-LOC]| +------------------------------------------------------------------------------------+
- See also
MPNetForTokenClassification for token-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
MultiClassifierDLApproach extends AnnotatorApproach[MultiClassifierDLModel] with ParamsAndFeaturesWritable with ClassifierEncoder
Trains a MultiClassifierDL for Multi-label Text Classification.
Trains a MultiClassifierDL for Multi-label Text Classification.
MultiClassifierDL uses a Bidirectional GRU with a convolutional model that we have built inside TensorFlow and supports up to 100 classes.
For instantiated/pretrained models, see MultiClassifierDLModel.
The input to
MultiClassifierDL
are Sentence Embeddings such as the state-of-the-art UniversalSentenceEncoder, BertSentenceEmbeddings, or SentenceEmbeddings.In machine learning, multi-label classification and the strongly related problem of multi-output classification are variants of the classification problem where multiple labels may be assigned to each instance. Multi-label classification is a generalization of multiclass classification, which is the single-label problem of categorizing instances into precisely one of more than two classes; in the multi-label problem there is no constraint on how many of the classes the instance can be assigned to. Formally, multi-label classification is the problem of finding a model that maps inputs x to binary vectors y (assigning a value of 0 or 1 for each element (label) in y).
Notes:
- This annotator requires an array of labels in type of String.
- UniversalSentenceEncoder,
BertSentenceEmbeddings, or
SentenceEmbeddings can be used for
the
inputCol
.
Setting a test dataset to monitor model metrics can be done with
.setTestDataset
. The method expects a path to a parquet file containing a dataframe that has the same required columns as the training dataframe. The pre-processing steps for the training dataframe should also be applied to the test dataframe. The following example will show how to create the test dataset:val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val embeddings = UniversalSentenceEncoder.pretrained() .setInputCols("document") .setOutputCol("sentence_embeddings") val preProcessingPipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) val Array(train, test) = data.randomSplit(Array(0.8, 0.2)) preProcessingPipeline .fit(test) .transform(test) .write .mode("overwrite") .parquet("test_data") val multiClassifier = new MultiClassifierDLApproach() .setInputCols("sentence_embeddings") .setOutputCol("category") .setLabelColumn("label") .setTestDataset("test_data")
For extended examples of usage, see the Examples and the MultiClassifierDLTestSpec.
Example
In this example, the training data has the form (Note: labels can be arbitrary)
mr,ref "name[Alimentum], area[city centre], familyFriendly[no], near[Burger King]",Alimentum is an adult establish found in the city centre area near Burger King. "name[Alimentum], area[city centre], familyFriendly[yes]",Alimentum is a family-friendly place in the city centre. ...
It needs some pre-processing first, so the labels are of type
Array[String]
. This can be done like so:import spark.implicits._ import com.johnsnowlabs.nlp.annotators.classifier.dl.MultiClassifierDLApproach import com.johnsnowlabs.nlp.base.DocumentAssembler import com.johnsnowlabs.nlp.embeddings.UniversalSentenceEncoder import org.apache.spark.ml.Pipeline import org.apache.spark.sql.functions.{col, udf} // Process training data to create text with associated array of labels def splitAndTrim = udf { labels: String => labels.split(", ").map(x=>x.trim) } val smallCorpus = spark.read .option("header", true) .option("inferSchema", true) .option("mode", "DROPMALFORMED") .csv("src/test/resources/classifier/e2e.csv") .withColumn("labels", splitAndTrim(col("mr"))) .withColumn("text", col("ref")) .drop("mr") smallCorpus.printSchema() // root // |-- ref: string (nullable = true) // |-- labels: array (nullable = true) // | |-- element: string (containsNull = true) // Then create pipeline for training val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") .setCleanupMode("shrink") val embeddings = UniversalSentenceEncoder.pretrained() .setInputCols("document") .setOutputCol("embeddings") val docClassifier = new MultiClassifierDLApproach() .setInputCols("embeddings") .setOutputCol("category") .setLabelColumn("labels") .setBatchSize(128) .setMaxEpochs(10) .setLr(1e-3f) .setThreshold(0.5f) .setValidationSplit(0.1f) val pipeline = new Pipeline() .setStages( Array( documentAssembler, embeddings, docClassifier ) ) val pipelineModel = pipeline.fit(smallCorpus)
- See also
ClassifierDLApproach for single-class classification
SentimentDLApproach for sentiment analysis
-
class
MultiClassifierDLModel extends AnnotatorModel[MultiClassifierDLModel] with HasSimpleAnnotate[MultiClassifierDLModel] with WriteTensorflowModel with HasStorageRef with ParamsAndFeaturesWritable with HasEngine
MultiClassifierDL for Multi-label Text Classification.
MultiClassifierDL for Multi-label Text Classification.
MultiClassifierDL Bidirectional GRU with Convolution model we have built inside TensorFlow and supports up to 100 classes. The input to MultiClassifierDL is Sentence Embeddings such as state-of-the-art UniversalSentenceEncoder, BertSentenceEmbeddings, or SentenceEmbeddings.
This is the instantiated model of the MultiClassifierDLApproach. For training your own model, please see the documentation of that class.
Pretrained models can be loaded with
pretrained
of the companion object:val multiClassifier = MultiClassifierDLModel.pretrained() .setInputCols("sentence_embeddings") .setOutputCol("categories")
The default model is
"multiclassifierdl_use_toxic"
, if no name is provided. It uses embeddings from the UniversalSentenceEncoder and classifies toxic comments. The data is based on the Jigsaw Toxic Comment Classification Challenge. For available pretrained models please see the Models Hub.In machine learning, multi-label classification and the strongly related problem of multi-output classification are variants of the classification problem where multiple labels may be assigned to each instance. Multi-label classification is a generalization of multiclass classification, which is the single-label problem of categorizing instances into precisely one of more than two classes; in the multi-label problem there is no constraint on how many of the classes the instance can be assigned to. Formally, multi-label classification is the problem of finding a model that maps inputs x to binary vectors y (assigning a value of 0 or 1 for each element (label) in y).
For extended examples of usage, see the Examples and the MultiClassifierDLTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base.DocumentAssembler import com.johnsnowlabs.nlp.annotators.classifier.dl.MultiClassifierDLModel import com.johnsnowlabs.nlp.embeddings.UniversalSentenceEncoder import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val useEmbeddings = UniversalSentenceEncoder.pretrained() .setInputCols("document") .setOutputCol("sentence_embeddings") val multiClassifierDl = MultiClassifierDLModel.pretrained() .setInputCols("sentence_embeddings") .setOutputCol("classifications") val pipeline = new Pipeline() .setStages(Array( documentAssembler, useEmbeddings, multiClassifierDl )) val data = Seq( "This is pretty good stuff!", "Wtf kind of crap is this" ).toDF("text") val result = pipeline.fit(data).transform(data) result.select("text", "classifications.result").show(false) +--------------------------+----------------+ |text |result | +--------------------------+----------------+ |This is pretty good stuff!|[] | |Wtf kind of crap is this |[toxic, obscene]| +--------------------------+----------------+
- See also
ClassifierDLModel for single-class classification
SentimentDLModel for sentiment analysis
- trait ReadAlbertForQuestionAnsweringDLModel extends ReadTensorflowModel with ReadOnnxModel with ReadOpenvinoModel with ReadSentencePieceModel
- trait ReadAlbertForSequenceDLModel extends ReadTensorflowModel with ReadOnnxModel with ReadOpenvinoModel with ReadSentencePieceModel
- trait ReadAlbertForTokenDLModel extends ReadTensorflowModel with ReadOnnxModel with ReadOpenvinoModel with ReadSentencePieceModel
- trait ReadAlbertForZeroShotDLModel extends ReadTensorflowModel with ReadOnnxModel with ReadSentencePieceModel
- trait ReadBartForZeroShotDLModel extends ReadTensorflowModel with ReadOnnxModel with ReadOpenvinoModel
- trait ReadBertForMultipleChoiceModel extends ReadOnnxModel with ReadOpenvinoModel
- trait ReadBertForQuestionAnsweringDLModel extends ReadTensorflowModel with ReadOnnxModel with ReadOpenvinoModel
- trait ReadBertForSequenceDLModel extends ReadTensorflowModel with ReadOnnxModel with ReadOpenvinoModel
- trait ReadBertForTokenDLModel extends ReadTensorflowModel with ReadOnnxModel with ReadOpenvinoModel
- trait ReadBertForZeroShotDLModel extends ReadTensorflowModel with ReadOnnxModel with ReadOpenvinoModel
- trait ReadCamemBertForQADLModel extends ReadTensorflowModel with ReadOnnxModel with ReadSentencePieceModel with ReadOpenvinoModel
- trait ReadCamemBertForSequenceDLModel extends ReadTensorflowModel with ReadOnnxModel with ReadSentencePieceModel with ReadOpenvinoModel
- trait ReadCamemBertForTokenDLModel extends ReadTensorflowModel with ReadOnnxModel with ReadSentencePieceModel with ReadOpenvinoModel
- trait ReadCamemBertForZeroShotClassification extends ReadTensorflowModel with ReadOnnxModel with ReadSentencePieceModel with ReadOpenvinoModel
- trait ReadClassifierDLTensorflowModel extends ReadTensorflowModel
- trait ReadDeBertaForQuestionAnsweringDLModel extends ReadTensorflowModel with ReadOnnxModel with ReadSentencePieceModel
- trait ReadDeBertaForSequenceDLModel extends ReadTensorflowModel with ReadOnnxModel with ReadSentencePieceModel
- trait ReadDeBertaForTokenDLModel extends ReadTensorflowModel with ReadOnnxModel with ReadSentencePieceModel
- trait ReadDeBertaForZeroShotDLModel extends ReadTensorflowModel with ReadSentencePieceModel with ReadOnnxModel
- trait ReadDistilBertForQuestionAnsweringDLModel extends ReadTensorflowModel with ReadOnnxModel
- trait ReadDistilBertForSequenceDLModel extends ReadTensorflowModel with ReadOnnxModel
- trait ReadDistilBertForTokenDLModel extends ReadTensorflowModel with ReadOnnxModel
- trait ReadDistilBertForZeroShotDLModel extends ReadTensorflowModel with ReadOnnxModel
- trait ReadLongformerForQuestionAnsweringDLModel extends ReadTensorflowModel
- trait ReadLongformerForSequenceDLModel extends ReadTensorflowModel
- trait ReadLongformerForTokenDLModel extends ReadTensorflowModel
- trait ReadMPNetForQuestionAnsweringDLModel extends ReadOnnxModel
- trait ReadMPNetForSequenceDLModel extends ReadOnnxModel
- trait ReadMPNetForTokenDLModel extends ReadOnnxModel
- trait ReadMultiClassifierDLTensorflowModel extends ReadTensorflowModel
- trait ReadPretrainedCamemBertForZeroShotClassification extends ParamsAndFeaturesReadable[CamemBertForZeroShotClassification] with HasPretrained[CamemBertForZeroShotClassification]
- trait ReadRoBertaForQuestionAnsweringDLModel extends ReadTensorflowModel with ReadOnnxModel
- trait ReadRoBertaForSequenceDLModel extends ReadTensorflowModel with ReadOnnxModel
- trait ReadRoBertaForTokenDLModel extends ReadTensorflowModel with ReadOnnxModel
- trait ReadRoBertaForZeroShotDLModel extends ReadTensorflowModel with ReadOnnxModel
- trait ReadSentimentDLTensorflowModel extends ReadTensorflowModel
- trait ReadTapasForQuestionAnsweringDLModel extends ReadTensorflowModel
- trait ReadXlmRoBertaForQuestionAnsweringDLModel extends ReadTensorflowModel with ReadOnnxModel with ReadSentencePieceModel
- trait ReadXlmRoBertaForSequenceDLModel extends ReadTensorflowModel with ReadOnnxModel with ReadSentencePieceModel
- trait ReadXlmRoBertaForTokenDLModel extends ReadTensorflowModel with ReadOnnxModel with ReadSentencePieceModel
- trait ReadXlmRoBertaForZeroShotDLModel extends ReadTensorflowModel with ReadSentencePieceModel with ReadOnnxModel
- trait ReadXlnetForSequenceDLModel extends ReadTensorflowModel with ReadSentencePieceModel
- trait ReadXlnetForTokenDLModel extends ReadTensorflowModel with ReadSentencePieceModel
- trait ReadablePretrainedAlbertForQAModel extends ParamsAndFeaturesReadable[AlbertForQuestionAnswering] with HasPretrained[AlbertForQuestionAnswering]
- trait ReadablePretrainedAlbertForSequenceModel extends ParamsAndFeaturesReadable[AlbertForSequenceClassification] with HasPretrained[AlbertForSequenceClassification]
- trait ReadablePretrainedAlbertForTokenModel extends ParamsAndFeaturesReadable[AlbertForTokenClassification] with HasPretrained[AlbertForTokenClassification]
- trait ReadablePretrainedAlbertForZeroShotModel extends ParamsAndFeaturesReadable[AlbertForZeroShotClassification] with HasPretrained[AlbertForZeroShotClassification]
- trait ReadablePretrainedBartForZeroShotModel extends ParamsAndFeaturesReadable[BartForZeroShotClassification] with HasPretrained[BartForZeroShotClassification]
- trait ReadablePretrainedBertForMultipleChoiceModel extends ParamsAndFeaturesReadable[BertForMultipleChoice] with HasPretrained[BertForMultipleChoice]
- trait ReadablePretrainedBertForQAModel extends ParamsAndFeaturesReadable[BertForQuestionAnswering] with HasPretrained[BertForQuestionAnswering]
- trait ReadablePretrainedBertForSequenceModel extends ParamsAndFeaturesReadable[BertForSequenceClassification] with HasPretrained[BertForSequenceClassification]
- trait ReadablePretrainedBertForTokenModel extends ParamsAndFeaturesReadable[BertForTokenClassification] with HasPretrained[BertForTokenClassification]
- trait ReadablePretrainedBertForZeroShotModel extends ParamsAndFeaturesReadable[BertForZeroShotClassification] with HasPretrained[BertForZeroShotClassification]
- trait ReadablePretrainedCamemBertForQAModel extends ParamsAndFeaturesReadable[CamemBertForQuestionAnswering] with HasPretrained[CamemBertForQuestionAnswering]
- trait ReadablePretrainedCamemBertForSequenceModel extends ParamsAndFeaturesReadable[CamemBertForSequenceClassification] with HasPretrained[CamemBertForSequenceClassification]
- trait ReadablePretrainedCamemBertForTokenModel extends ParamsAndFeaturesReadable[CamemBertForTokenClassification] with HasPretrained[CamemBertForTokenClassification]
- trait ReadablePretrainedClassifierDL extends ParamsAndFeaturesReadable[ClassifierDLModel] with HasPretrained[ClassifierDLModel]
- trait ReadablePretrainedDeBertaForQAModel extends ParamsAndFeaturesReadable[DeBertaForQuestionAnswering] with HasPretrained[DeBertaForQuestionAnswering]
- trait ReadablePretrainedDeBertaForSequenceModel extends ParamsAndFeaturesReadable[DeBertaForSequenceClassification] with HasPretrained[DeBertaForSequenceClassification]
- trait ReadablePretrainedDeBertaForTokenModel extends ParamsAndFeaturesReadable[DeBertaForTokenClassification] with HasPretrained[DeBertaForTokenClassification]
- trait ReadablePretrainedDeBertaForZeroShotModel extends ParamsAndFeaturesReadable[DeBertaForZeroShotClassification] with HasPretrained[DeBertaForZeroShotClassification]
- trait ReadablePretrainedDistilBertForQAModel extends ParamsAndFeaturesReadable[DistilBertForQuestionAnswering] with HasPretrained[DistilBertForQuestionAnswering]
- trait ReadablePretrainedDistilBertForSequenceModel extends ParamsAndFeaturesReadable[DistilBertForSequenceClassification] with HasPretrained[DistilBertForSequenceClassification]
- trait ReadablePretrainedDistilBertForTokenModel extends ParamsAndFeaturesReadable[DistilBertForTokenClassification] with HasPretrained[DistilBertForTokenClassification]
- trait ReadablePretrainedDistilBertForZeroShotModel extends ParamsAndFeaturesReadable[DistilBertForZeroShotClassification] with HasPretrained[DistilBertForZeroShotClassification]
- trait ReadablePretrainedLongformerForQAModel extends ParamsAndFeaturesReadable[LongformerForQuestionAnswering] with HasPretrained[LongformerForQuestionAnswering]
- trait ReadablePretrainedLongformerForSequenceModel extends ParamsAndFeaturesReadable[LongformerForSequenceClassification] with HasPretrained[LongformerForSequenceClassification]
- trait ReadablePretrainedLongformerForTokenModel extends ParamsAndFeaturesReadable[LongformerForTokenClassification] with HasPretrained[LongformerForTokenClassification]
- trait ReadablePretrainedMPNetForQAModel extends ParamsAndFeaturesReadable[MPNetForQuestionAnswering] with HasPretrained[MPNetForQuestionAnswering]
- trait ReadablePretrainedMPNetForSequenceModel extends ParamsAndFeaturesReadable[MPNetForSequenceClassification] with HasPretrained[MPNetForSequenceClassification]
- trait ReadablePretrainedMPNetForTokenDLModel extends ParamsAndFeaturesReadable[MPNetForTokenClassification] with HasPretrained[MPNetForTokenClassification]
- trait ReadablePretrainedMultiClassifierDL extends ParamsAndFeaturesReadable[MultiClassifierDLModel] with HasPretrained[MultiClassifierDLModel]
- trait ReadablePretrainedRoBertaForQAModel extends ParamsAndFeaturesReadable[RoBertaForQuestionAnswering] with HasPretrained[RoBertaForQuestionAnswering]
- trait ReadablePretrainedRoBertaForSequenceModel extends ParamsAndFeaturesReadable[RoBertaForSequenceClassification] with HasPretrained[RoBertaForSequenceClassification]
- trait ReadablePretrainedRoBertaForTokenModel extends ParamsAndFeaturesReadable[RoBertaForTokenClassification] with HasPretrained[RoBertaForTokenClassification]
- trait ReadablePretrainedRoBertaForZeroShotModel extends ParamsAndFeaturesReadable[RoBertaForZeroShotClassification] with HasPretrained[RoBertaForZeroShotClassification]
- trait ReadablePretrainedSentimentDL extends ParamsAndFeaturesReadable[SentimentDLModel] with HasPretrained[SentimentDLModel]
- trait ReadablePretrainedTapasForQAModel extends ParamsAndFeaturesReadable[TapasForQuestionAnswering] with HasPretrained[TapasForQuestionAnswering]
- trait ReadablePretrainedXlmRoBertaForQAModel extends ParamsAndFeaturesReadable[XlmRoBertaForQuestionAnswering] with HasPretrained[XlmRoBertaForQuestionAnswering]
- trait ReadablePretrainedXlmRoBertaForSequenceModel extends ParamsAndFeaturesReadable[XlmRoBertaForSequenceClassification] with HasPretrained[XlmRoBertaForSequenceClassification]
- trait ReadablePretrainedXlmRoBertaForTokenModel extends ParamsAndFeaturesReadable[XlmRoBertaForTokenClassification] with HasPretrained[XlmRoBertaForTokenClassification]
- trait ReadablePretrainedXlmRoBertaForZeroShotModel extends ParamsAndFeaturesReadable[XlmRoBertaForZeroShotClassification] with HasPretrained[XlmRoBertaForZeroShotClassification]
- trait ReadablePretrainedXlnetForSequenceModel extends ParamsAndFeaturesReadable[XlnetForSequenceClassification] with HasPretrained[XlnetForSequenceClassification]
- trait ReadablePretrainedXlnetForTokenModel extends ParamsAndFeaturesReadable[XlnetForTokenClassification] with HasPretrained[XlnetForTokenClassification]
-
class
RoBertaForQuestionAnswering extends AnnotatorModel[RoBertaForQuestionAnswering] with HasBatchedAnnotate[RoBertaForQuestionAnswering] with WriteTensorflowModel with WriteOnnxModel with HasCaseSensitiveProperties with HasEngine
RoBertaForQuestionAnswering can load RoBERTa Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
RoBertaForQuestionAnswering can load RoBERTa Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
Pretrained models can be loaded with
pretrained
of the companion object:val spanClassifier = RoBertaForQuestionAnswering.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer")
The default model is
"roberta_base_qa_squad2"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see RoBertaForQuestionAnsweringTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val document = new MultiDocumentAssembler() .setInputCols("question", "context") .setOutputCols("document_question", "document_context") val questionAnswering = RoBertaForQuestionAnswering.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( document, questionAnswering )) val data = Seq("What's my name?", "My name is Clara and I live in Berkeley.").toDF("question", "context") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +---------------------+ |result | +---------------------+ |[Clara] | ++--------------------+
- See also
RoBertaForSequenceClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
RoBertaForSequenceClassification extends AnnotatorModel[RoBertaForSequenceClassification] with HasBatchedAnnotate[RoBertaForSequenceClassification] with WriteTensorflowModel with WriteOnnxModel with HasCaseSensitiveProperties with HasClassifierActivationProperties with HasEngine
RoBertaForSequenceClassification can load RoBERTa Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
RoBertaForSequenceClassification can load RoBERTa Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = RoBertaForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"roberta_base_sequence_classifier_imdb"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see RoBertaForSequenceClassification.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val sequenceClassifier = RoBertaForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, sequenceClassifier )) val data = Seq("I loved this movie when I was a child.", "It was pretty boring.").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------+ |result| +------+ |[pos] | |[neg] | +------+
- See also
RoBertaForSequenceClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
RoBertaForTokenClassification extends AnnotatorModel[RoBertaForTokenClassification] with HasBatchedAnnotate[RoBertaForTokenClassification] with WriteTensorflowModel with WriteOnnxModel with HasCaseSensitiveProperties with HasEngine
RoBertaForTokenClassification can load RoBERTa Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g.
RoBertaForTokenClassification can load RoBERTa Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val tokenClassifier = RoBertaForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"roberta_base_token_classifier_conll03"
, if no name is provided.For available pretrained models please see the Models Hub.
and the RoBertaForTokenClassificationTestSpec. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val tokenClassifier = RoBertaForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, tokenClassifier )) val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------------------------------------------------------------------------------------+ |result | +------------------------------------------------------------------------------------+ |[B-PER, I-PER, O, O, O, B-LOC, O, O, O, B-LOC, O, O, O, O, B-PER, O, O, O, O, B-LOC]| +------------------------------------------------------------------------------------+
- See also
RoBertaForTokenClassification for token-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
RoBertaForZeroShotClassification extends AnnotatorModel[RoBertaForZeroShotClassification] with HasBatchedAnnotate[RoBertaForZeroShotClassification] with WriteTensorflowModel with WriteOnnxModel with HasCaseSensitiveProperties with HasClassifierActivationProperties with HasEngine with HasCandidateLabelsProperties
RoBertaForZeroShotClassification using a
ModelForSequenceClassification
trained on NLI (natural language inference) tasks.RoBertaForZeroShotClassification using a
ModelForSequenceClassification
trained on NLI (natural language inference) tasks. Equivalent ofRoBertaForZeroShotClassification
models, but these models don't require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it's slower but it is much more flexible.Note that the model will loop through all provided labels. So the more labels you have, the longer this process will take.
Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis pair and passed to the pretrained model.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = RoBertaForZeroShotClassification .pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"roberta_base_zero_shot_classifier_nli"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val sequenceClassifier = RoBertaForZeroShotClassification .pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, sequenceClassifier )) val data = Seq("I loved this movie when I was a child.", "It was pretty boring.").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------+ |result| +------+ |[pos] | |[neg] | +------+
- See also
RoBertaForZeroShotClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
SentimentDLApproach extends AnnotatorApproach[SentimentDLModel] with ParamsAndFeaturesWritable with ClassifierEncoder
Trains a SentimentDL, an annotator for multi-class sentiment analysis.
Trains a SentimentDL, an annotator for multi-class sentiment analysis.
In natural language processing, sentiment analysis is the task of classifying the affective state or subjective view of a text. A common example is if either a product review or tweet can be interpreted positively or negatively.
For the instantiated/pretrained models, see SentimentDLModel.
Notes:
- This annotator accepts a label column of a single item in either type of String, Int,
Float, or Double. So positive sentiment can be expressed as either
"positive"
or0
, negative sentiment as"negative"
or1
. - UniversalSentenceEncoder,
BertSentenceEmbeddings, or
SentenceEmbeddings can be used for
the
inputCol
.
Setting a test dataset to monitor model metrics can be done with
.setTestDataset
. The method expects a path to a parquet file containing a dataframe that has the same required columns as the training dataframe. The pre-processing steps for the training dataframe should also be applied to the test dataframe. The following example will show how to create the test dataset:val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val embeddings = UniversalSentenceEncoder.pretrained() .setInputCols("document") .setOutputCol("sentence_embeddings") val preProcessingPipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) val Array(train, test) = data.randomSplit(Array(0.8, 0.2)) preProcessingPipeline .fit(test) .transform(test) .write .mode("overwrite") .parquet("test_data") val classifier = new SentimentDLApproach() .setInputCols("sentence_embeddings") .setOutputCol("sentiment") .setLabelColumn("label") .setTestDataset("test_data")
For extended examples of usage, see the Examples and the SentimentDLTestSpec.
Example
In this example,
sentiment.csv
is in the formtext,label This movie is the best movie I have watched ever! In my opinion this movie can win an award.,0 This was a terrible movie! The acting was bad really bad!,1
The model can then be trained with
import com.johnsnowlabs.nlp.base.DocumentAssembler import com.johnsnowlabs.nlp.annotator.UniversalSentenceEncoder import com.johnsnowlabs.nlp.annotators.classifier.dl.{SentimentDLApproach, SentimentDLModel} import org.apache.spark.ml.Pipeline val smallCorpus = spark.read.option("header", "true").csv("src/test/resources/classifier/sentiment.csv") val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val useEmbeddings = UniversalSentenceEncoder.pretrained() .setInputCols("document") .setOutputCol("sentence_embeddings") val docClassifier = new SentimentDLApproach() .setInputCols("sentence_embeddings") .setOutputCol("sentiment") .setLabelColumn("label") .setBatchSize(32) .setMaxEpochs(1) .setLr(5e-3f) .setDropout(0.5f) val pipeline = new Pipeline() .setStages( Array( documentAssembler, useEmbeddings, docClassifier ) ) val pipelineModel = pipeline.fit(smallCorpus)
- See also
ClassifierDLApproach for general single-class classification
MultiClassifierDLApproach for general multi-class classification
- This annotator accepts a label column of a single item in either type of String, Int,
Float, or Double. So positive sentiment can be expressed as either
-
class
SentimentDLModel extends AnnotatorModel[SentimentDLModel] with HasSimpleAnnotate[SentimentDLModel] with WriteTensorflowModel with HasStorageRef with ParamsAndFeaturesWritable with HasEngine
SentimentDL, an annotator for multi-class sentiment analysis.
SentimentDL, an annotator for multi-class sentiment analysis.
In natural language processing, sentiment analysis is the task of classifying the affective state or subjective view of a text. A common example is if either a product review or tweet can be interpreted positively or negatively.
This is the instantiated model of the SentimentDLApproach. For training your own model, please see the documentation of that class.
Pretrained models can be loaded with
pretrained
of the companion object:val sentiment = SentimentDLModel.pretrained() .setInputCols("sentence_embeddings") .setOutputCol("sentiment")
The default model is
"sentimentdl_use_imdb"
, if no name is provided. It is english sentiment analysis trained on the IMDB dataset. For available pretrained models please see the Models Hub.For extended examples of usage, see the Examples and the SentimentDLTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base.DocumentAssembler import com.johnsnowlabs.nlp.annotator.UniversalSentenceEncoder import com.johnsnowlabs.nlp.annotators.classifier.dl.SentimentDLModel import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val useEmbeddings = UniversalSentenceEncoder.pretrained() .setInputCols("document") .setOutputCol("sentence_embeddings") val sentiment = SentimentDLModel.pretrained("sentimentdl_use_twitter") .setInputCols("sentence_embeddings") .setThreshold(0.7F) .setOutputCol("sentiment") val pipeline = new Pipeline().setStages(Array( documentAssembler, useEmbeddings, sentiment )) val data = Seq( "Wow, the new video is awesome!", "bruh what a damn waste of time" ).toDF("text") val result = pipeline.fit(data).transform(data) result.select("text", "sentiment.result").show(false) +------------------------------+----------+ |text |result | +------------------------------+----------+ |Wow, the new video is awesome!|[positive]| |bruh what a damn waste of time|[negative]| +------------------------------+----------+
- See also
ClassifierDLModel for general single-class classification
MultiClassifierDLModel for general multi-class classification
-
class
TapasForQuestionAnswering extends BertForQuestionAnswering
TapasForQuestionAnswering is an implementation of TaPas - a BERT-based model specifically designed for answering questions about tabular data.
TapasForQuestionAnswering is an implementation of TaPas - a BERT-based model specifically designed for answering questions about tabular data. It takes TABLE and DOCUMENT annotations as input and tries to answer the questions in the document by using the data from the table. The model is based in BertForQuestionAnswering and shares all its parameters with it.
Pretrained models can be loaded with
pretrained
of the companion object:val tapas = TapasForQuestionAnswering.pretrained() .setInputCols(Array("document_question", "table")) .setOutputCol("answer")
The default model is
"table_qa_tapas_base_finetuned_wtq"
, if no name is provided.For available pretrained models please see the Models Hub.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val questions = """ |Who earns 100,000,000? |Who has more money? |How old are they? |""".stripMargin.trim val jsonData = """ |{ | "header": ["name", "money", "age"], | "rows": [ | ["Donald Trump", "$100,000,000", "75"], | ["Elon Musk", "$20,000,000,000,000", "55"] | ] |} |""".stripMargin.trim val data = Seq((jsonData, questions)) .toDF("json_table", "questions") .repartition(1) val docAssembler = new MultiDocumentAssembler() .setInputCols("json_table", "questions") .setOutputCols("document_table", "document_questions") val sentenceDetector = SentenceDetectorDLModel .pretrained() .setInputCols(Array("document_questions")) .setOutputCol("question") val tableAssembler = new TableAssembler() .setInputFormat("json") .setInputCols(Array("document_table")) .setOutputCol("table") val tapas = TapasForQuestionAnswering .pretrained() .setInputCols(Array("question", "table")) .setOutputCol("answer") val pipeline = new Pipeline() .setStages( Array( docAssembler, sentenceDetector, tableAssembler, tapas)) val pipelineModel = pipeline.fit(data) val result = pipeline.fit(data).transform(data) result .selectExpr("explode(answer) as answer") .selectExpr( "answer.metadata.question", "answer.result") +-----------------------+----------------------------------------+ |question |result | +-----------------------+----------------------------------------+ |Who earns 100,000,000? |Donald Trump | |Who has more money? |Elon Musk | |How much they all earn?|COUNT($100,000,000, $20,000,000,000,000)| |How old are they? |AVERAGE(75, 55) | +-----------------------+----------------------------------------+
- See also
https://aclanthology.org/2020.acl-main.398/ for more details about the TaPas model
TableAssembler for loading tabular data
Annotators Main Page for a list of transformer based classifiers
-
class
XlmRoBertaForQuestionAnswering extends AnnotatorModel[XlmRoBertaForQuestionAnswering] with HasBatchedAnnotate[XlmRoBertaForQuestionAnswering] with WriteTensorflowModel with WriteOnnxModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasEngine
XlmRoBertaForQuestionAnswering can load XLM-RoBERTa Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
XlmRoBertaForQuestionAnswering can load XLM-RoBERTa Models with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).
Pretrained models can be loaded with
pretrained
of the companion object:val spanClassifier = XlmRoBertaForQuestionAnswering.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer")
The default model is
"xlm_roberta_base_qa_squad2"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see XlmRoBertaForQuestionAnsweringTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val document = new MultiDocumentAssembler() .setInputCols("question", "context") .setOutputCols("document_question", "document_context") val questionAnswering = XlmRoBertaForQuestionAnswering.pretrained() .setInputCols(Array("document_question", "document_context")) .setOutputCol("answer") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( document, questionAnswering )) val data = Seq("What's my name?", "My name is Clara and I live in Berkeley.").toDF("question", "context") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +---------------------+ |result | +---------------------+ |[Clara] | ++--------------------+
- See also
XlmRoBertaForSequenceClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
XlmRoBertaForSequenceClassification extends AnnotatorModel[XlmRoBertaForSequenceClassification] with HasBatchedAnnotate[XlmRoBertaForSequenceClassification] with WriteOnnxModel with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasClassifierActivationProperties with HasEngine
XlmRoBertaForSequenceClassification can load XLM-RoBERTa Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
XlmRoBertaForSequenceClassification can load XLM-RoBERTa Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"xlm_roberta_base_sequence_classifier_imdb"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see XlmRoBertaForSequenceClassification.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, sequenceClassifier )) val data = Seq("I loved this movie when I was a child.", "It was pretty boring.").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------+ |result| +------+ |[pos] | |[neg] | +------+
- See also
XlmRoBertaForSequenceClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
XlmRoBertaForTokenClassification extends AnnotatorModel[XlmRoBertaForTokenClassification] with HasBatchedAnnotate[XlmRoBertaForTokenClassification] with WriteOnnxModel with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasEngine
XlmRoBertaForTokenClassification can load XLM-RoBERTa Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g.
XlmRoBertaForTokenClassification can load XLM-RoBERTa Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val tokenClassifier = XlmRoBertaForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"mpnet_base_token_classifier"
, if no name is provided.For available pretrained models please see the Models Hub.
and the XlmRoBertaForTokenClassificationTestSpec. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val tokenClassifier = XlmRoBertaForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, tokenClassifier )) val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------------------------------------------------------------------------------------+ |result | +------------------------------------------------------------------------------------+ |[B-PER, I-PER, O, O, O, B-LOC, O, O, O, B-LOC, O, O, O, O, B-PER, O, O, O, O, B-LOC]| +------------------------------------------------------------------------------------+
- See also
XlmRoBertaForTokenClassification for token-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
XlmRoBertaForZeroShotClassification extends AnnotatorModel[XlmRoBertaForZeroShotClassification] with HasBatchedAnnotate[XlmRoBertaForZeroShotClassification] with WriteTensorflowModel with WriteOnnxModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasClassifierActivationProperties with HasEngine with HasCandidateLabelsProperties
XlmRoBertaForZeroShotClassification using a
ModelForSequenceClassification
trained on NLI (natural language inference) tasks.XlmRoBertaForZeroShotClassification using a
ModelForSequenceClassification
trained on NLI (natural language inference) tasks. Equivalent ofXlmRoBertaForZeroShotClassification
models, but these models don't require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it's slower but it is much more flexible.Note that the model will loop through all provided labels. So the more labels you have, the longer this process will take.
Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis pair and passed to the pretrained model.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = XlmRoBertaForZeroShotClassification .pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"xlm_roberta_large_zero_shot_classifier_xnli_anli"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val sequenceClassifier = XlmRoBertaForZeroShotClassification .pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, sequenceClassifier )) val data = Seq("I loved this movie when I was a child.", "It was pretty boring.").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------+ |result| +------+ |[pos] | |[neg] | +------+
- See also
XlmRoBertaForZeroShotClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
XlnetForSequenceClassification extends AnnotatorModel[XlnetForSequenceClassification] with HasBatchedAnnotate[XlnetForSequenceClassification] with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasClassifierActivationProperties with HasEngine
XlnetForSequenceClassification can load XLNet Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
XlnetForSequenceClassification can load XLNet Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = XlnetForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"xlnet_base_sequence_classifier_imdb"
, if no name is provided.For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see XlnetForSequenceClassification.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val sequenceClassifier = XlnetForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, sequenceClassifier )) val data = Seq("I loved this movie when I was a child.", "It was pretty boring.").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------+ |result| +------+ |[pos] | |[neg] | +------+
- See also
XlnetForSequenceClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
-
class
XlnetForTokenClassification extends AnnotatorModel[XlnetForTokenClassification] with HasBatchedAnnotate[XlnetForTokenClassification] with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasEngine
XlnetForTokenClassification can load XLNet Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g.
XlnetForTokenClassification can load XLNet Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val tokenClassifier = XlnetForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"xlnet_base_token_classifier_conll03"
, if no name is provided.For available pretrained models please see the Models Hub.
and the XlnetForTokenClassificationTestSpec. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val tokenClassifier = XlnetForTokenClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, tokenClassifier )) val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +------------------------------------------------------------------------------------+ |result | +------------------------------------------------------------------------------------+ |[B-PER, I-PER, O, O, O, B-LOC, O, O, O, B-LOC, O, O, O, O, B-PER, O, O, O, O, B-LOC]| +------------------------------------------------------------------------------------+
- See also
XlnetForTokenClassification for token-level classification
Annotators Main Page for a list of transformer based classifiers
Value Members
-
object
AlbertForQuestionAnswering extends ReadablePretrainedAlbertForQAModel with ReadAlbertForQuestionAnsweringDLModel with Serializable
This is the companion object of AlbertForQuestionAnswering.
This is the companion object of AlbertForQuestionAnswering. Please refer to that class for the documentation.
-
object
AlbertForSequenceClassification extends ReadablePretrainedAlbertForSequenceModel with ReadAlbertForSequenceDLModel with Serializable
This is the companion object of AlbertForSequenceClassification.
This is the companion object of AlbertForSequenceClassification. Please refer to that class for the documentation.
-
object
AlbertForTokenClassification extends ReadablePretrainedAlbertForTokenModel with ReadAlbertForTokenDLModel with Serializable
This is the companion object of AlbertForTokenClassification.
This is the companion object of AlbertForTokenClassification. Please refer to that class for the documentation.
-
object
AlbertForZeroShotClassification extends ReadablePretrainedAlbertForZeroShotModel with ReadAlbertForZeroShotDLModel with Serializable
This is the companion object of AlbertForZeroShotClassification.
This is the companion object of AlbertForZeroShotClassification. Please refer to that class for the documentation.
-
object
BartForZeroShotClassification extends ReadablePretrainedBartForZeroShotModel with ReadBartForZeroShotDLModel with Serializable
This is the companion object of BartForZeroShotClassification.
This is the companion object of BartForZeroShotClassification. Please refer to that class for the documentation.
-
object
BertForMultipleChoice extends ReadablePretrainedBertForMultipleChoiceModel with ReadBertForMultipleChoiceModel with Serializable
This is the companion object of BertForMultipleChoice.
This is the companion object of BertForMultipleChoice. Please refer to that class for the documentation.
-
object
BertForQuestionAnswering extends ReadablePretrainedBertForQAModel with ReadBertForQuestionAnsweringDLModel with Serializable
This is the companion object of BertForQuestionAnswering.
This is the companion object of BertForQuestionAnswering. Please refer to that class for the documentation.
-
object
BertForSequenceClassification extends ReadablePretrainedBertForSequenceModel with ReadBertForSequenceDLModel with Serializable
This is the companion object of BertForSequenceClassification.
This is the companion object of BertForSequenceClassification. Please refer to that class for the documentation.
-
object
BertForTokenClassification extends ReadablePretrainedBertForTokenModel with ReadBertForTokenDLModel with Serializable
This is the companion object of BertForTokenClassification.
This is the companion object of BertForTokenClassification. Please refer to that class for the documentation.
-
object
BertForZeroShotClassification extends ReadablePretrainedBertForZeroShotModel with ReadBertForZeroShotDLModel with Serializable
This is the companion object of BertForZeroShotClassification.
This is the companion object of BertForZeroShotClassification. Please refer to that class for the documentation.
-
object
CamemBertForQuestionAnswering extends ReadablePretrainedCamemBertForQAModel with ReadCamemBertForQADLModel with Serializable
This is the companion object of CamemBertForQuestionAnswering.
This is the companion object of CamemBertForQuestionAnswering. Please refer to that class for the documentation.
-
object
CamemBertForSequenceClassification extends ReadablePretrainedCamemBertForSequenceModel with ReadCamemBertForSequenceDLModel with Serializable
This is the companion object of CamemBertForSequenceClassification.
This is the companion object of CamemBertForSequenceClassification. Please refer to that class for the documentation.
-
object
CamemBertForTokenClassification extends ReadablePretrainedCamemBertForTokenModel with ReadCamemBertForTokenDLModel with Serializable
This is the companion object of CamemBertForTokenClassification.
This is the companion object of CamemBertForTokenClassification. Please refer to that class for the documentation.
-
object
CamemBertForZeroShotClassification extends ReadPretrainedCamemBertForZeroShotClassification with ReadCamemBertForZeroShotClassification with Serializable
This is the companion object of CamemBertForZeroShotClassification.
This is the companion object of CamemBertForZeroShotClassification. Please refer to that class for the documentation.
-
object
ClassifierDLApproach extends DefaultParamsReadable[ClassifierDLApproach] with Serializable
This is the companion object of ClassifierDLApproach.
This is the companion object of ClassifierDLApproach. Please refer to that class for the documentation.
-
object
ClassifierDLModel extends ReadablePretrainedClassifierDL with ReadClassifierDLTensorflowModel with Serializable
This is the companion object of ClassifierDLModel.
This is the companion object of ClassifierDLModel. Please refer to that class for the documentation.
-
object
DeBertaForQuestionAnswering extends ReadablePretrainedDeBertaForQAModel with ReadDeBertaForQuestionAnsweringDLModel with Serializable
This is the companion object of DeBertaForQuestionAnswering.
This is the companion object of DeBertaForQuestionAnswering. Please refer to that class for the documentation.
-
object
DeBertaForSequenceClassification extends ReadablePretrainedDeBertaForSequenceModel with ReadDeBertaForSequenceDLModel with Serializable
This is the companion object of DeBertaForSequenceClassification.
This is the companion object of DeBertaForSequenceClassification. Please refer to that class for the documentation.
-
object
DeBertaForTokenClassification extends ReadablePretrainedDeBertaForTokenModel with ReadDeBertaForTokenDLModel with Serializable
This is the companion object of DeBertaForTokenClassification.
This is the companion object of DeBertaForTokenClassification. Please refer to that class for the documentation.
-
object
DeBertaForZeroShotClassification extends ReadablePretrainedDeBertaForZeroShotModel with ReadDeBertaForZeroShotDLModel with Serializable
This is the companion object of DeBertaForZeroShotClassification.
This is the companion object of DeBertaForZeroShotClassification. Please refer to that class for the documentation.
-
object
DistilBertForQuestionAnswering extends ReadablePretrainedDistilBertForQAModel with ReadDistilBertForQuestionAnsweringDLModel with Serializable
This is the companion object of DistilBertForQuestionAnswering.
This is the companion object of DistilBertForQuestionAnswering. Please refer to that class for the documentation.
-
object
DistilBertForSequenceClassification extends ReadablePretrainedDistilBertForSequenceModel with ReadDistilBertForSequenceDLModel with Serializable
This is the companion object of DistilBertForSequenceClassification.
This is the companion object of DistilBertForSequenceClassification. Please refer to that class for the documentation.
-
object
DistilBertForTokenClassification extends ReadablePretrainedDistilBertForTokenModel with ReadDistilBertForTokenDLModel with Serializable
This is the companion object of DistilBertForTokenClassification.
This is the companion object of DistilBertForTokenClassification. Please refer to that class for the documentation.
-
object
DistilBertForZeroShotClassification extends ReadablePretrainedDistilBertForZeroShotModel with ReadDistilBertForZeroShotDLModel with Serializable
This is the companion object of DistilBertForZeroShotClassification.
This is the companion object of DistilBertForZeroShotClassification. Please refer to that class for the documentation.
-
object
LongformerForQuestionAnswering extends ReadablePretrainedLongformerForQAModel with ReadLongformerForQuestionAnsweringDLModel with Serializable
This is the companion object of LongformerForQuestionAnswering.
This is the companion object of LongformerForQuestionAnswering. Please refer to that class for the documentation.
-
object
LongformerForSequenceClassification extends ReadablePretrainedLongformerForSequenceModel with ReadLongformerForSequenceDLModel with Serializable
This is the companion object of LongformerForSequenceClassification.
This is the companion object of LongformerForSequenceClassification. Please refer to that class for the documentation.
-
object
LongformerForTokenClassification extends ReadablePretrainedLongformerForTokenModel with ReadLongformerForTokenDLModel with Serializable
This is the companion object of LongformerForTokenClassification.
This is the companion object of LongformerForTokenClassification. Please refer to that class for the documentation.
-
object
MPNetForQuestionAnswering extends ReadablePretrainedMPNetForQAModel with ReadMPNetForQuestionAnsweringDLModel with Serializable
This is the companion object of MPNetForQuestionAnswering.
This is the companion object of MPNetForQuestionAnswering. Please refer to that class for the documentation.
-
object
MPNetForSequenceClassification extends ReadablePretrainedMPNetForSequenceModel with ReadMPNetForSequenceDLModel with Serializable
This is the companion object of MPNetForSequenceClassification.
This is the companion object of MPNetForSequenceClassification. Please refer to that class for the documentation.
-
object
MPNetForTokenClassification extends ReadablePretrainedMPNetForTokenDLModel with ReadMPNetForTokenDLModel with Serializable
This is the companion object of MPNetForTokenClassification.
This is the companion object of MPNetForTokenClassification. Please refer to that class for the documentation.
-
object
MultiClassifierDLModel extends ReadablePretrainedMultiClassifierDL with ReadMultiClassifierDLTensorflowModel with Serializable
This is the companion object of MultiClassifierDLModel.
This is the companion object of MultiClassifierDLModel. Please refer to that class for the documentation.
-
object
RoBertaForQuestionAnswering extends ReadablePretrainedRoBertaForQAModel with ReadRoBertaForQuestionAnsweringDLModel with Serializable
This is the companion object of RoBertaForQuestionAnswering.
This is the companion object of RoBertaForQuestionAnswering. Please refer to that class for the documentation.
-
object
RoBertaForSequenceClassification extends ReadablePretrainedRoBertaForSequenceModel with ReadRoBertaForSequenceDLModel with Serializable
This is the companion object of RoBertaForSequenceClassification.
This is the companion object of RoBertaForSequenceClassification. Please refer to that class for the documentation.
-
object
RoBertaForTokenClassification extends ReadablePretrainedRoBertaForTokenModel with ReadRoBertaForTokenDLModel with Serializable
This is the companion object of RoBertaForTokenClassification.
This is the companion object of RoBertaForTokenClassification. Please refer to that class for the documentation.
-
object
RoBertaForZeroShotClassification extends ReadablePretrainedRoBertaForZeroShotModel with ReadRoBertaForZeroShotDLModel with Serializable
This is the companion object of RoBertaForZeroShotClassification.
This is the companion object of RoBertaForZeroShotClassification. Please refer to that class for the documentation.
-
object
SentimentApproach extends DefaultParamsReadable[SentimentDLApproach]
This is the companion object of SentimentApproach.
This is the companion object of SentimentApproach. Please refer to that class for the documentation.
-
object
SentimentDLModel extends ReadablePretrainedSentimentDL with ReadSentimentDLTensorflowModel with Serializable
This is the companion object of SentimentDLModel.
This is the companion object of SentimentDLModel. Please refer to that class for the documentation.
-
object
TapasForQuestionAnswering extends ReadablePretrainedTapasForQAModel with ReadTapasForQuestionAnsweringDLModel with Serializable
This is the companion object of TapasForQuestionAnswering.
This is the companion object of TapasForQuestionAnswering. Please refer to that class for the documentation.
-
object
XlmRoBertaForQuestionAnswering extends ReadablePretrainedXlmRoBertaForQAModel with ReadXlmRoBertaForQuestionAnsweringDLModel with Serializable
This is the companion object of XlmRoBertaForQuestionAnswering.
This is the companion object of XlmRoBertaForQuestionAnswering. Please refer to that class for the documentation.
-
object
XlmRoBertaForSequenceClassification extends ReadablePretrainedXlmRoBertaForSequenceModel with ReadXlmRoBertaForSequenceDLModel with Serializable
This is the companion object of XlmRoBertaForSequenceClassification.
This is the companion object of XlmRoBertaForSequenceClassification. Please refer to that class for the documentation.
-
object
XlmRoBertaForTokenClassification extends ReadablePretrainedXlmRoBertaForTokenModel with ReadXlmRoBertaForTokenDLModel with Serializable
This is the companion object of XlmRoBertaForTokenClassification.
This is the companion object of XlmRoBertaForTokenClassification. Please refer to that class for the documentation.
-
object
XlmRoBertaForZeroShotClassification extends ReadablePretrainedXlmRoBertaForZeroShotModel with ReadXlmRoBertaForZeroShotDLModel with Serializable
This is the companion object of XlmRoBertaForZeroShotClassification.
This is the companion object of XlmRoBertaForZeroShotClassification. Please refer to that class for the documentation.
-
object
XlnetForSequenceClassification extends ReadablePretrainedXlnetForSequenceModel with ReadXlnetForSequenceDLModel with Serializable
This is the companion object of XlnetForSequenceClassification.
This is the companion object of XlnetForSequenceClassification. Please refer to that class for the documentation.
-
object
XlnetForTokenClassification extends ReadablePretrainedXlnetForTokenModel with ReadXlnetForTokenDLModel with Serializable
This is the companion object of XlnetForTokenClassification.
This is the companion object of XlnetForTokenClassification. Please refer to that class for the documentation.