sparknlp.annotator.embeddings.bi_encoder_multimodal_embeddings#

Module Contents#

Classes#

BiEncoderMultimodalEmbeddings

Dual-encoder multimodal embeddings annotator.

class BiEncoderMultimodalEmbeddings(classname='com.johnsnowlabs.nlp.embeddings.BiEncoderMultimodalEmbeddings', java_model=None)[source]#

Dual-encoder multimodal embeddings annotator.

The output is written to two derived columns based on outputCol: <outputCol>_doc_embeddings and <outputCol>_image_embeddings.

Input Annotation types

Output Annotation type

DOCUMENT, IMAGE

SENTENCE_EMBEDDINGS

name = 'BiEncoderMultimodalEmbeddings'[source]#
inputAnnotatorTypes[source]#
outputAnnotatorType = 'sentence_embeddings'[source]#
batchSize[source]#
setParams()[source]#
setBatchSize(value)[source]#
static loadSavedModel(folder, spark_session)[source]#

Loads a locally saved external dual ONNX model.

Parameters:
folderstr

Folder of the external model bundle.

spark_sessionpyspark.sql.SparkSession

The current SparkSession.

Returns:
BiEncoderMultimodalEmbeddings

The restored model.

static pretrained(name='ops_mm_embedding_v1_2b', lang='en', remote_loc=None)[source]#

Downloads and loads a pretrained model.

Parameters:
namestr, optional

Name of the pretrained model, by default “ops_mm_embedding_v1_2b”.

langstr, optional

Language of the pretrained model, by default “en”.

remote_locstr, optional

Optional remote address of the resource. Will use Spark NLP repositories otherwise.

Returns:
BiEncoderMultimodalEmbeddings

The restored model.