sparknlp.base.image_assembler#

Contains classes for the ImageAssembler.

Module Contents#

Classes#

ImageAssembler

Prepares images read by Spark into a format that is processable by Spark NLP.

class ImageAssembler[source]#

Prepares images read by Spark into a format that is processable by Spark NLP. This component is needed to process images.

Input Annotation types

Output Annotation type

NONE

IMAGE

Parameters:
inputCol

Input column name

outputCol

Output column name

Examples

>>> import sparknlp
>>> from sparknlp.base import *
>>> from pyspark.ml import Pipeline
>>> data = spark.read.format("image").load("./tmp/images/").toDF("image")
>>> imageAssembler = ImageAssembler().setInputCol("image").setOutputCol("image_assembler")
>>> result = imageAssembler.transform(data)
>>> result.select("image_assembler").show()
>>> result.select("image_assembler").printSchema()
root
 |-- image_assembler: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- annotatorType: string (nullable = true)
 |    |    |-- origin: string (nullable = true)
 |    |    |-- height: integer (nullable = true)
 |    |    |-- width: integer (nullable = true)
 |    |    |-- nChannels: integer (nullable = true)
 |    |    |-- mode: integer (nullable = true)
 |    |    |-- result: binary (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
setInputCol(value)[source]#

Sets input column name.

Parameters:
valuestr

Name of the input column that has image format loaded via spark.read.format(“image”).load(PATH)

setOutputCol(value)[source]#

Sets output column name.

Parameters:
valuestr

Name of the Output Column

getOutputCol()[source]#

Gets output column name of annotations.