all-mpnet-base-v2 from sentence-transformers OpenVINO

Description

This is a sentence-transformers model: It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for tasks like clustering or semantic search.

This model is intended to be used as a sentence and short paragraph encoder. Given an input text, it outputs a vector that captures the semantic information. The sentence vector may be used for information retrieval, clustering, or sentence similarity tasks.

By default, input text longer than 384 word pieces is truncated.

Download Copy S3 URI

How to use

from sparknlp.base import DocumentAssembler
from sparknlp.annotator import MPNetEmbeddings
from pyspark.ml import Pipeline

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

mpnet_loaded = MPNetEmbeddings.load("all_mpnet_base_v2_openvino")\
    .setInputCols(["document"])\
    .setOutputCol("mpnet_embeddings")\

pipeline = Pipeline(
    stages = [
        document_assembler,
        mpnet_loaded
  ])

data = spark.createDataFrame([
    ['William Henry Gates III (born October 28, 1955) is an American business magnate, software developer, investor, and philanthropist.']
]).toDF("text")

model = pipeline.fit(data)
result = model.transform(data)

result.selectExpr("explode(mpnet_embeddings.embeddings) as embeddings").show()

import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.embeddings.MPNetEmbeddings
import org.apache.spark.ml.Pipeline
import org.apache.spark.sql.functions.explode
import spark.implicits._

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val mpnetEmbeddings = MPNetEmbeddings.load("all_mpnet_base_v2_openvino")
  .setInputCols("document")
  .setOutputCol("mpnet_embeddings")

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  mpnetEmbeddings
))

val data = Seq(
  "William Henry Gates III (born October 28, 1955) is an American business magnate, software developer, investor, and philanthropist."
).toDF("text")

val model = pipeline.fit(data)
val result = model.transform(data)

result.select(explode($"mpnet_embeddings.embeddings").alias("embeddings")).show(false)

Results

+--------------------+
|          embeddings|
+--------------------+
|[-0.020282388, 0....|
+--------------------+

Model Information

Model Name:	all_mpnet_base_v2_openvino
Compatibility:	Spark NLP 6.0.0+
License:	Open Source
Edition:	Official
Input Labels:	[document]
Output Labels:	[mpnet_embeddings]
Language:	en
Size:	406.5 MB

PREVIOUSPhi-3.5-mini int4

NEXTMedEmbed base: Specialized Embedding Model for Medical and Clinical Information Retrieval (OpenVINO)