sparknlp.reader.reader2table#

Module Contents#

Classes#

Reader2Table

Base class for :py:class:`Transformer`s that wrap Java/Scala

class Reader2Table[source]#

Base class for :py:class:`Transformer`s that wrap Java/Scala implementations. Subclasses should ensure they have the transformer Java object available as _java_obj.

name = 'Reader2Table'[source]#
outputAnnotatorType = 'document'[source]#
contentPath[source]#
outputCol[source]#
contentType[source]#
explodeDocs[source]#
flattenOutput[source]#
titleThreshold[source]#
outputFormat[source]#
setParams()[source]#
setContentPath(value)[source]#

Sets content path.

Parameters:
valuestr

contentPath path to files to read

setContentType(value)[source]#

Set the content type to load following MIME specification

Parameters:
valuestr

content type to load following MIME specification

setExplodeDocs(value)[source]#

Sets whether to explode the documents into separate rows.

Parameters:
valueboolean
Whether to explode the documents into separate rows
setOutputCol(value)[source]#

Sets output column name.

Parameters:
valuestr

Name of the Output Column

setFlattenOutput(value)[source]#

Sets whether to flatten the output to plain text with minimal metadata.

Parameters:
valuebool

If true, output is flattened to plain text with minimal metadata

setTitleThreshold(value)[source]#

Sets the minimum font size threshold for title detection in PDF documents.

Parameters:
valuefloat

Minimum font size threshold for title detection in PDF docs

setOutputFormat(value)[source]#

Sets the output format for the table content.

Parameters:
valuestr

Output format for the table content. Options are ‘plain-text’ or ‘html-table’. Default is ‘json-table’.