Packages

package cv

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class CLIPForZeroShotClassification extends AnnotatorModel[CLIPForZeroShotClassification] with HasBatchedAnnotateImage[CLIPForZeroShotClassification] with HasImageFeatureProperties with WriteTensorflowModel with WriteOnnxModel with HasEngine with HasRescaleFactor

    Zero Shot Image Classifier based on CLIP.

    Zero Shot Image Classifier based on CLIP.

    CLIP (Contrastive Language-Image Pre-Training) is a neural network that was trained on image and text pairs. It has the ability to predict images without training on any hard-coded labels. This makes it very flexible, as labels can be provided during inference. This is similar to the zero-shot capabilities of the GPT-2 and 3 models.

    Pretrained models can be loaded with pretrained of the companion object:

    val imageClassifier = CLIPForZeroShotClassification.pretrained()
      .setInputCols("image_assembler")
      .setOutputCol("label")

    The default model is "zero_shot_classifier_clip_vit_base_patch32", if no name is provided.

    For available pretrained models please see the Models Hub.

    Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see CLIPForZeroShotClassificationTestSpec.

    Example

    import com.johnsnowlabs.nlp.ImageAssembler
    import com.johnsnowlabs.nlp.annotator._
    import org.apache.spark.ml.Pipeline
    
    val imageDF = ResourceHelper.spark.read
      .format("image")
      .option("dropInvalid", value = true)
      .load("src/test/resources/image/")
    
    val imageAssembler: ImageAssembler = new ImageAssembler()
      .setInputCol("image")
      .setOutputCol("image_assembler")
    
    val candidateLabels = Array(
      "a photo of a bird",
      "a photo of a cat",
      "a photo of a dog",
      "a photo of a hen",
      "a photo of a hippo",
      "a photo of a room",
      "a photo of a tractor",
      "a photo of an ostrich",
      "a photo of an ox")
    
    val imageClassifier = CLIPForZeroShotClassification
      .pretrained()
      .setInputCols("image_assembler")
      .setOutputCol("label")
      .setCandidateLabels(candidateLabels)
    
    val pipeline =
      new Pipeline().setStages(Array(imageAssembler, imageClassifier)).fit(imageDF).transform(imageDF)
    
    pipeline
      .selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "label.result")
      .show(truncate = false)
    +-----------------+-----------------------+
    |image_name       |result                 |
    +-----------------+-----------------------+
    |palace.JPEG      |[a photo of a room]    |
    |egyptian_cat.jpeg|[a photo of a cat]     |
    |hippopotamus.JPEG|[a photo of a hippo]   |
    |hen.JPEG         |[a photo of a hen]     |
    |ostrich.JPEG     |[a photo of an ostrich]|
    |junco.JPEG       |[a photo of a bird]    |
    |bluetick.jpg     |[a photo of a dog]     |
    |chihuahua.jpg    |[a photo of a dog]     |
    |tractor.JPEG     |[a photo of a tractor] |
    |ox.JPEG          |[a photo of an ox]     |
    +-----------------+-----------------------+
  2. class ConvNextForImageClassification extends SwinForImageClassification

    ConvNextForImageClassification is an image classifier based on ConvNet models.

    ConvNextForImageClassification is an image classifier based on ConvNet models.

    The ConvNeXT model was proposed in A ConvNet for the 2020s by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie. ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them.

    Pretrained models can be loaded with pretrained of the companion object:

    val imageClassifier = ConvNextForImageClassification.pretrained()
      .setInputCols("image_assembler")
      .setOutputCol("class")

    The default model is "image_classifier_convnext_tiny_224_local", if no name is provided.

    For available pretrained models please see the Models Hub.

    Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see ConvNextForImageClassificationTestSpec.

    References:

    A ConvNet for the 2020s

    Paper Abstract:

    The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model. A vanilla ViT, on the other hand, faces difficulties when applied to general computer vision tasks such as object detection and semantic segmentation. It is the hierarchical Transformers (e.g., Swin Transformers) that reintroduced several ConvNet priors, making Transformers practically viable as a generic vision backbone and demonstrating remarkable performance on a wide variety of vision tasks. However, the effectiveness of such hybrid approaches is still largely credited to the intrinsic superiority of Transformers, rather than the inherent inductive biases of convolutions. In this work, we reexamine the design spaces and test the limits of what a pure ConvNet can achieve. We gradually "modernize" a standard ResNet toward the design of a vision Transformer, and discover several key components that contribute to the performance difference along the way. The outcome of this exploration is a family of pure ConvNet models dubbed ConvNeXt. Constructed entirely from standard ConvNet modules, ConvNeXts compete favorably with Transformers in terms of accuracy and scalability, achieving 87.8% ImageNet top-1 accuracy and outperforming Swin Transformers on COCO detection and ADE20K segmentation, while maintaining the simplicity and efficiency of standard ConvNets.

    Example

    import com.johnsnowlabs.nlp.annotator._
    import com.johnsnowlabs.nlp.ImageAssembler
    import org.apache.spark.ml.Pipeline
    
    val imageDF: DataFrame = spark.read
      .format("image")
      .option("dropInvalid", value = true)
      .load("src/test/resources/image/")
    
    val imageAssembler = new ImageAssembler()
      .setInputCol("image")
      .setOutputCol("image_assembler")
    
    val imageClassifier = ConvNextForImageClassification
      .pretrained()
      .setInputCols("image_assembler")
      .setOutputCol("class")
    
    val pipeline = new Pipeline().setStages(Array(imageAssembler, imageClassifier))
    val pipelineDF = pipeline.fit(imageDF).transform(imageDF)
    
    pipelineDF
      .selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "class.result")
      .show(truncate = false)
    +-----------------+----------------------------------------------------------+
    |image_name       |result                                                    |
    +-----------------+----------------------------------------------------------+
    |palace.JPEG      |[palace]                                                  |
    |egyptian_cat.jpeg|[tabby, tabby cat]                                        |
    |hippopotamus.JPEG|[hippopotamus, hippo, river horse, Hippopotamus amphibius]|
    |hen.JPEG         |[hen]                                                     |
    |ostrich.JPEG     |[ostrich, Struthio camelus]                               |
    |junco.JPEG       |[junco, snowbird]                                         |
    |bluetick.jpg     |[bluetick]                                                |
    |chihuahua.jpg    |[Chihuahua]                                               |
    |tractor.JPEG     |[tractor]                                                 |
    |ox.JPEG          |[ox]                                                      |
    +-----------------+----------------------------------------------------------+
  3. trait HasRescaleFactor extends AnyRef

    Enables parameters to handle rescaling for image pre-processors.

  4. trait ReadCLIPForZeroShotClassificationModel extends ReadTensorflowModel with ReadOnnxModel
  5. trait ReadConvNextForImageDLModel extends ReadTensorflowModel
  6. trait ReadSwinForImageDLModel extends ReadTensorflowModel
  7. trait ReadViTForImageDLModel extends ReadTensorflowModel
  8. trait ReadVisionEncoderDecoderDLModel extends ReadTensorflowModel
  9. trait ReadablePretrainedCLIPForZeroShotClassificationModel extends ParamsAndFeaturesReadable[CLIPForZeroShotClassification] with HasPretrained[CLIPForZeroShotClassification]
  10. trait ReadablePretrainedConvNextForImageModel extends ParamsAndFeaturesReadable[ConvNextForImageClassification] with HasPretrained[ConvNextForImageClassification]
  11. trait ReadablePretrainedSwinForImageModel extends ParamsAndFeaturesReadable[SwinForImageClassification] with HasPretrained[SwinForImageClassification]
  12. trait ReadablePretrainedViTForImageModel extends ParamsAndFeaturesReadable[ViTForImageClassification] with HasPretrained[ViTForImageClassification]
  13. trait ReadablePretrainedVisionEncoderDecoderModel extends ParamsAndFeaturesReadable[VisionEncoderDecoderForImageCaptioning] with HasPretrained[VisionEncoderDecoderForImageCaptioning]
  14. class SwinForImageClassification extends ViTForImageClassification with HasRescaleFactor

    SwinImageClassification is an image classifier based on Swin.

    SwinImageClassification is an image classifier based on Swin.

    The Swin Transformer was proposed in Swin Transformer: Hierarchical Vision Transformer using Shifted Windows by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.

    It is basically a hierarchical Transformer whose representation is computed with shifted windows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection.

    Pretrained models can be loaded with pretrained of the companion object:

    val imageClassifier = SwinForImageClassification.pretrained()
      .setInputCols("image_assembler")
      .setOutputCol("class")

    The default model is "image_classifier_swin_base_patch4_window7_224", if no name is provided.

    For available pretrained models please see the Models Hub.

    Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see SwinForImageClassificationTest.

    References:

    Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

    Paper Abstract:

    This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text. To address these differences, we propose a hierarchical Transformer whose representation is computed with Shifted windows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection. This hierarchical architecture has the flexibility to model at various scales and has linear computational complexity with respect to image size. These qualities of Swin Transformer make it compatible with a broad range of vision tasks, including image classification (87.3 top-1 accuracy on ImageNet-1K) and dense prediction tasks such as object detection (58.7 box AP and 51.1 mask AP on COCO test- dev) and semantic segmentation (53.5 mIoU on ADE20K val). Its performance surpasses the previous state-of-the- art by a large margin of +2.7 box AP and +2.6 mask AP on COCO, and +3.2 mIoU on ADE20K, demonstrating the potential of Transformer-based models as vision backbones. The hierarchical design and the shifted window approach also prove beneficial for all-MLP architectures.

    Example

    import com.johnsnowlabs.nlp.annotator._
    import com.johnsnowlabs.nlp.ImageAssembler
    import org.apache.spark.ml.Pipeline
    
    val imageDF: DataFrame = spark.read
      .format("image")
      .option("dropInvalid", value = true)
      .load("src/test/resources/image/")
    
    val imageAssembler = new ImageAssembler()
      .setInputCol("image")
      .setOutputCol("image_assembler")
    
    val imageClassifier = SwinForImageClassification
      .pretrained()
      .setInputCols("image_assembler")
      .setOutputCol("class")
    
    val pipeline = new Pipeline().setStages(Array(imageAssembler, imageClassifier))
    val pipelineDF = pipeline.fit(imageDF).transform(imageDF)
    
    pipelineDF
      .selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "class.result")
      .show(truncate = false)
    +-----------------+----------------------------------------------------------+
    |image_name       |result                                                    |
    +-----------------+----------------------------------------------------------+
    |palace.JPEG      |[palace]                                                  |
    |egyptian_cat.jpeg|[tabby, tabby cat]                                        |
    |hippopotamus.JPEG|[hippopotamus, hippo, river horse, Hippopotamus amphibius]|
    |hen.JPEG         |[hen]                                                     |
    |ostrich.JPEG     |[ostrich, Struthio camelus]                               |
    |junco.JPEG       |[junco, snowbird]                                         |
    |bluetick.jpg     |[bluetick]                                                |
    |chihuahua.jpg    |[Chihuahua]                                               |
    |tractor.JPEG     |[tractor]                                                 |
    |ox.JPEG          |[ox]                                                      |
    +-----------------+----------------------------------------------------------+
  15. class ViTForImageClassification extends AnnotatorModel[ViTForImageClassification] with HasBatchedAnnotateImage[ViTForImageClassification] with HasImageFeatureProperties with WriteTensorflowModel with HasEngine

    Vision Transformer (ViT) for image classification.

    Vision Transformer (ViT) for image classification.

    ViT is a transformer based alternative to the convolutional neural networks usually used for image recognition tasks.

    Pretrained models can be loaded with pretrained of the companion object:

    val imageClassifier = ViTForImageClassification.pretrained()
      .setInputCols("image_assembler")
      .setOutputCol("class")

    The default model is "image_classifier_vit_base_patch16_224", if no name is provided.

    For available pretrained models please see the Models Hub.

    Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see ViTImageClassificationTestSpec.

    References:

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Paper Abstract:

    While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

    Example

    import com.johnsnowlabs.nlp.annotator._
    import com.johnsnowlabs.nlp.ImageAssembler
    import org.apache.spark.ml.Pipeline
    
    val imageDF: DataFrame = spark.read
      .format("image")
      .option("dropInvalid", value = true)
      .load("src/test/resources/image/")
    
    val imageAssembler = new ImageAssembler()
      .setInputCol("image")
      .setOutputCol("image_assembler")
    
    val imageClassifier = ViTForImageClassification
      .pretrained()
      .setInputCols("image_assembler")
      .setOutputCol("class")
    
    val pipeline = new Pipeline().setStages(Array(imageAssembler, imageClassifier))
    val pipelineDF = pipeline.fit(imageDF).transform(imageDF)
    
    pipelineDF
      .selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "class.result")
      .show(truncate = false)
    +-----------------+----------------------------------------------------------+
    |image_name       |result                                                    |
    +-----------------+----------------------------------------------------------+
    |palace.JPEG      |[palace]                                                  |
    |egyptian_cat.jpeg|[Egyptian cat]                                            |
    |hippopotamus.JPEG|[hippopotamus, hippo, river horse, Hippopotamus amphibius]|
    |hen.JPEG         |[hen]                                                     |
    |ostrich.JPEG     |[ostrich, Struthio camelus]                               |
    |junco.JPEG       |[junco, snowbird]                                         |
    |bluetick.jpg     |[bluetick]                                                |
    |chihuahua.jpg    |[Chihuahua]                                               |
    |tractor.JPEG     |[tractor]                                                 |
    |ox.JPEG          |[ox]                                                      |
    +-----------------+----------------------------------------------------------+
  16. class VisionEncoderDecoderForImageCaptioning extends AnnotatorModel[VisionEncoderDecoderForImageCaptioning] with HasBatchedAnnotateImage[VisionEncoderDecoderForImageCaptioning] with HasImageFeatureProperties with WriteTensorflowModel with HasEngine with HasRescaleFactor with HasGeneratorProperties

    VisionEncoderDecoder model that converts images into text captions.

    VisionEncoderDecoder model that converts images into text captions. It allows for the use of pretrained vision auto-encoding models, such as ViT, BEiT, or DeiT as the encoder, in combination with pretrained language models, like RoBERTa, GPT2, or BERT as the decoder.

    Pretrained models can be loaded with pretrained of the companion object:

    val imageClassifier = VisionEncoderDecoderForImageCaptioning.pretrained()
      .setInputCols("image_assembler")
      .setOutputCol("caption")

    The default model is "image_captioning_vit_gpt2", if no name is provided.

    For available pretrained models please see the Models Hub.

    Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see VisionEncoderDecoderTestSpec.

    Note:

    This is a very computationally expensive module especially on larger batch sizes. The use of an accelerator such as GPU is recommended.

    Example

    import com.johnsnowlabs.nlp.annotator._
    import com.johnsnowlabs.nlp.ImageAssembler
    import org.apache.spark.ml.Pipeline
    
    val imageDF: DataFrame = spark.read
      .format("image")
      .option("dropInvalid", value = true)
      .load("src/test/resources/image/")
    
    val imageAssembler = new ImageAssembler()
      .setInputCol("image")
      .setOutputCol("image_assembler")
    
    val imageCaptioning = VisionEncoderDecoderForImageCaptioning
      .pretrained()
      .setBeamSize(2)
      .setDoSample(false)
      .setInputCols("image_assembler")
      .setOutputCol("caption")
    
    val pipeline = new Pipeline().setStages(Array(imageAssembler, imageCaptioning))
    val pipelineDF = pipeline.fit(imageDF).transform(imageDF)
    
    pipelineDF
      .selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "caption.result")
      .show(truncate = false)
    
    +-----------------+---------------------------------------------------------+
    |image_name       |result                                                   |
    +-----------------+---------------------------------------------------------+
    |palace.JPEG      |[a large room filled with furniture and a large window]  |
    |egyptian_cat.jpeg|[a cat laying on a couch next to another cat]            |
    |hippopotamus.JPEG|[a brown bear in a body of water]                        |
    |hen.JPEG         |[a flock of chickens standing next to each other]        |
    |ostrich.JPEG     |[a large bird standing on top of a lush green field]     |
    |junco.JPEG       |[a small bird standing on a wet ground]                  |
    |bluetick.jpg     |[a small dog standing on a wooden floor]                 |
    |chihuahua.jpg    |[a small brown dog wearing a blue sweater]               |
    |tractor.JPEG     |[a man is standing in a field with a tractor]            |
    |ox.JPEG          |[a large brown cow standing on top of a lush green field]|
    +-----------------+---------------------------------------------------------+

Value Members

  1. object CLIPForZeroShotClassification extends ReadablePretrainedCLIPForZeroShotClassificationModel with ReadCLIPForZeroShotClassificationModel with Serializable

    This is the companion object of CLIPForZeroShotClassification.

    This is the companion object of CLIPForZeroShotClassification. Please refer to that class for the documentation.

  2. object ConvNextForImageClassification extends ReadablePretrainedConvNextForImageModel with ReadConvNextForImageDLModel with Serializable

    This is the companion object of ConvNextForImageClassification.

    This is the companion object of ConvNextForImageClassification. Please refer to that class for the documentation.

  3. object SwinForImageClassification extends ReadablePretrainedSwinForImageModel with ReadSwinForImageDLModel with Serializable

    This is the companion object of SwinForImageClassification.

    This is the companion object of SwinForImageClassification. Please refer to that class for the documentation.

  4. object ViTForImageClassification extends ReadablePretrainedViTForImageModel with ReadViTForImageDLModel with Serializable

    This is the companion object of ViTForImageClassification.

    This is the companion object of ViTForImageClassification. Please refer to that class for the documentation.

  5. object VisionEncoderDecoderForImageCaptioning extends ReadablePretrainedVisionEncoderDecoderModel with ReadVisionEncoderDecoderDLModel with Serializable

    This is the companion object of VisionEncoderDecoderForImageCaptioning.

    This is the companion object of VisionEncoderDecoderForImageCaptioning. Please refer to that class for the documentation.

Ungrouped