sparknlp.annotator.cv.florence2_transformer
#
Module Contents#
Classes#
Florence2Transformer can load Florence-2 models for a variety of vision and vision-language tasks using prompt-based inference. |
- class Florence2Transformer(classname='com.johnsnowlabs.nlp.annotators.cv.Florence2Transformer', java_model=None)[source]#
Florence2Transformer can load Florence-2 models for a variety of vision and vision-language tasks using prompt-based inference.
The model supports image captioning, object detection, segmentation, OCR, and more, using prompt tokens as described in the Florence-2 documentation.
Pretrained models can be loaded with
pretrained()
of the companion object:>>> florence2 = Florence2Transformer.pretrained() ... .setInputCols(["image_assembler"]) ... .setOutputCol("answer")
The default model is
"florence2_base_ft_int4"
, if no name is provided.For available pretrained models please see the Models Hub.
Input Annotation types
Output Annotation type
IMAGE
DOCUMENT
- Parameters:
- batchSize
Batch size. Large values allows faster processing but requires more memory, by default 2
- maxOutputLength
Maximum length of output text, by default 200
- minOutputLength
Minimum length of the sequence to be generated, by default 10
- doSample
Whether or not to use sampling; use greedy decoding otherwise, by default False
- temperature
The value used to module the next token probabilities, by default 1.0
- topK
The number of highest probability vocabulary tokens to keep for top-k-filtering, by default 50
- topP
If set to float < 1, only the most probable tokens with probabilities that add up to
top_p
or higher are kept for generation, by default 1.0- repetitionPenalty
The parameter for repetition penalty. 1.0 means no penalty, by default 1.0
- noRepeatNgramSize
If set to int > 0, all ngrams of that size can only occur once, by default 3
- ignoreTokenIds
A list of token ids which are ignored in the decoder’s output, by default []
- beamSize
The Number of beams for beam search, by default 1
Examples
>>> import sparknlp >>> from sparknlp.base import * >>> from sparknlp.annotator import * >>> from pyspark.ml import Pipeline >>> image_df = spark.read.format("image").load(path=images_path) >>> test_df = image_df.withColumn("text", lit("<OD>")) >>> imageAssembler = ImageAssembler() ... .setInputCol("image") ... .setOutputCol("image_assembler") >>> florence2 = Florence2Transformer.pretrained() ... .setInputCols(["image_assembler"]) ... .setOutputCol("answer") >>> pipeline = Pipeline().setStages([ ... imageAssembler, ... florence2 ... ]) >>> result = pipeline.fit(test_df).transform(test_df) >>> result.select("image_assembler.origin", "answer.result").show(False)
- setTopK(value)[source]#
Sets the number of highest probability vocabulary tokens to keep for top-k-filtering.
- setRepetitionPenalty(value)[source]#
Sets the parameter for repetition penalty. 1.0 means no penalty.