sparknlp.annotator.cv.florence2_transformer#

Module Contents#

Classes#

Florence2Transformer

Florence2Transformer can load Florence-2 models for a variety of vision and vision-language tasks using prompt-based inference.

class Florence2Transformer(classname='com.johnsnowlabs.nlp.annotators.cv.Florence2Transformer', java_model=None)[source]#

Florence2Transformer can load Florence-2 models for a variety of vision and vision-language tasks using prompt-based inference.

The model supports image captioning, object detection, segmentation, OCR, and more, using prompt tokens as described in the Florence-2 documentation.

Pretrained models can be loaded with pretrained() of the companion object:

>>> florence2 = Florence2Transformer.pretrained()     ...     .setInputCols(["image_assembler"])     ...     .setOutputCol("answer")

The default model is "florence2_base_ft_int4", if no name is provided.

For available pretrained models please see the Models Hub.

Input Annotation types

Output Annotation type

IMAGE

DOCUMENT

Parameters:
batchSize

Batch size. Large values allows faster processing but requires more memory, by default 2

maxOutputLength

Maximum length of output text, by default 200

minOutputLength

Minimum length of the sequence to be generated, by default 10

doSample

Whether or not to use sampling; use greedy decoding otherwise, by default False

temperature

The value used to module the next token probabilities, by default 1.0

topK

The number of highest probability vocabulary tokens to keep for top-k-filtering, by default 50

topP

If set to float < 1, only the most probable tokens with probabilities that add up to top_p or higher are kept for generation, by default 1.0

repetitionPenalty

The parameter for repetition penalty. 1.0 means no penalty, by default 1.0

noRepeatNgramSize

If set to int > 0, all ngrams of that size can only occur once, by default 3

ignoreTokenIds

A list of token ids which are ignored in the decoder’s output, by default []

beamSize

The Number of beams for beam search, by default 1

Examples

>>> import sparknlp
>>> from sparknlp.base import *
>>> from sparknlp.annotator import *
>>> from pyspark.ml import Pipeline
>>> image_df = spark.read.format("image").load(path=images_path)
>>> test_df = image_df.withColumn("text", lit("<OD>"))
>>> imageAssembler = ImageAssembler()     ...     .setInputCol("image")     ...     .setOutputCol("image_assembler")
>>> florence2 = Florence2Transformer.pretrained()     ...     .setInputCols(["image_assembler"])     ...     .setOutputCol("answer")
>>> pipeline = Pipeline().setStages([
...     imageAssembler,
...     florence2
... ])
>>> result = pipeline.fit(test_df).transform(test_df)
>>> result.select("image_assembler.origin", "answer.result").show(False)
name = 'Florence2Transformer'[source]#
inputAnnotatorTypes[source]#
outputAnnotatorType = 'document'[source]#
minOutputLength[source]#
maxOutputLength[source]#
doSample[source]#
temperature[source]#
topK[source]#
topP[source]#
repetitionPenalty[source]#
noRepeatNgramSize[source]#
ignoreTokenIds[source]#
beamSize[source]#
batchSize[source]#
setMinOutputLength(value)[source]#

Sets minimum length of the sequence to be generated.

setMaxOutputLength(value)[source]#

Sets maximum length of output text.

setDoSample(value)[source]#

Sets whether or not to use sampling; use greedy decoding otherwise.

setTemperature(value)[source]#

Sets the value used to module the next token probabilities.

setTopK(value)[source]#

Sets the number of highest probability vocabulary tokens to keep for top-k-filtering.

setTopP(value)[source]#

Sets the top cumulative probability for vocabulary tokens.

setRepetitionPenalty(value)[source]#

Sets the parameter for repetition penalty. 1.0 means no penalty.

setNoRepeatNgramSize(value)[source]#

Sets size of n-grams that can only occur once.

setIgnoreTokenIds(value)[source]#

A list of token ids which are ignored in the decoder’s output.

setBeamSize(value)[source]#

Sets the number of beams for beam search.

setBatchSize(value)[source]#

Sets the batch size.

static loadSavedModel(folder, spark_session, use_openvino=False)[source]#

Loads a locally saved model.

static pretrained(name='florence2_base_ft_int4', lang='en', remote_loc=None)[source]#

Downloads and loads a pretrained model.