`sparknlp.reader.reader2image`#

Module Contents#

Classes#

Reader2Image

The Reader2Image annotator allows you to use the reading files with images more smoothly within existing

class Reader2Image[source]#

The Reader2Image annotator allows you to use the reading files with images more smoothly within existing Spark NLP workflows, enabling seamless reuse of your pipelines. Reader2Image can be used for extracting structured image content from various document types using Spark NLP readers. It supports reading from many file types and returns parsed output as a structured Spark DataFrame.

Supported formats include HTML and Markdown.

== Example == This example demonstrates how to load HTML files with images and process them into a structured Spark DataFrame using Reader2Image.

Expected output: +——————-+——————–+ | fileName| image| +——————-+——————–+ |example-images.html|[{image, example-...| |example-images.html|[{image, example-...| +——————-+——————–+

Schema: root

|– fileName: string (nullable = true) |– image: array (nullable = false) | |– element: struct (containsNull = true) | | |– annotatorType: string (nullable = true) | | |– origin: string (nullable = true) | | |– height: integer (nullable = false) | | |– width: integer (nullable = false) | | |– nChannels: integer (nullable = false) | | |– mode: integer (nullable = false) | | |– result: binary (nullable = true) | | |– metadata: map (nullable = true) | | | |– key: string | | | |– value: string (valueContainsNull = true) | | |– text: string (nullable = true)

name = 'Reader2Image'[source]#

outputAnnotatorType = 'image'[source]#

userMessage[source]#

promptTemplate[source]#

customPromptTemplate[source]#

useEncodedImageBytes[source]#

outputPromptColumn[source]#

setParams()[source]#

setUserMessage(value: str)[source]#

Sets custom user message.

Parameters:

valuestr: Custom user message to include.

setPromptTemplate(value: str)[source]#

Sets format of the output prompt.

Parameters:

valuestr: Prompt template format.

setCustomPromptTemplate(value: str)[source]#

Sets custom prompt template for image models.

Parameters:

valuestr: Custom prompt template string.

setUseEncodedImageBytes(value: bool)[source]#

Sets whether to use encoded image bytes or decoded pixels.

Parameters:

valuebool: If True, keeps the image bytes in their encoded (compressed) form. If False, decodes the image into a pixel matrix representation.

setOutputPromptColumn(value: bool)[source]#

Enables or disables creation of a prompt column.

Parameters:

valuebool: If True, adds an additional ‘prompt’ column to the output DataFrame containing the text prompt as a Spark NLP Annotation.

sparknlp.reader.reader2image#

Module Contents#

Classes#

`sparknlp.reader.reader2image`#