sparknlp.reader.reader2image#

Module Contents#

Classes#

Reader2Image

The Reader2Image annotator allows you to use the reading files with images more smoothly within existing

class Reader2Image[source]#

The Reader2Image annotator allows you to use the reading files with images more smoothly within existing Spark NLP workflows, enabling seamless reuse of your pipelines. Reader2Image can be used for extracting structured image content from various document types using Spark NLP readers. It supports reading from many file types and returns parsed output as a structured Spark DataFrame.

Supported formats include HTML and Markdown.

== Example == This example demonstrates how to load HTML files with images and process them into a structured Spark DataFrame using Reader2Image.

Expected output: +——————-+——————–+ | fileName| image| +——————-+——————–+ |example-images.html|[{image, example-...| |example-images.html|[{image, example-...| +——————-+——————–+

Schema: root

|– fileName: string (nullable = true) |– image: array (nullable = false) | |– element: struct (containsNull = true) | | |– annotatorType: string (nullable = true) | | |– origin: string (nullable = true) | | |– height: integer (nullable = false) | | |– width: integer (nullable = false) | | |– nChannels: integer (nullable = false) | | |– mode: integer (nullable = false) | | |– result: binary (nullable = true) | | |– metadata: map (nullable = true) | | | |– key: string | | | |– value: string (valueContainsNull = true) | | |– text: string (nullable = true)

name = 'Reader2Image'[source]#
outputAnnotatorType = 'image'[source]#
userMessage[source]#
promptTemplate[source]#
customPromptTemplate[source]#
setParams()[source]#
setUserMessage(value: str)[source]#

Sets custom user message.

Parameters:
valuestr

Custom user message to include.

setPromptTemplate(value: str)[source]#

Sets format of the output prompt.

Parameters:
valuestr

Prompt template format.

setCustomPromptTemplate(value: str)[source]#

Sets custom prompt template for image models.

Parameters:
valuestr

Custom prompt template string.