`sparknlp.partition.partition_properties`#

Contains classes for partition properties used in reading various document types.

Module Contents#

Classes#

`HasEmailReaderProperties`	Components that take parameters. This also provides an internal
`HasExcelReaderProperties`	Components that take parameters. This also provides an internal
`HasHTMLReaderProperties`	Components that take parameters. This also provides an internal
`HasPowerPointProperties`	Components that take parameters. This also provides an internal
`HasTextReaderProperties`	Components that take parameters. This also provides an internal
`HasChunkerProperties`	Components that take parameters. This also provides an internal

class HasEmailReaderProperties[source]#

Components that take parameters. This also provides an internal param map to store parameter values attached to the instance.

New in version 1.3.0.

addAttachmentContent[source]#

setAddAttachmentContent(value)[source]#

Sets whether to extract and include the textual content of plain-text attachments in the output.

Parameters:

valuebool: Whether to include text from plain-text attachments.

getAddAttachmentContent()[source]#

Gets whether to extract and include the textual content of plain-text attachments in the output.

Returns:

bool: Whether to include text from plain-text attachments.

class HasExcelReaderProperties[source]#

Components that take parameters. This also provides an internal param map to store parameter values attached to the instance.

New in version 1.3.0.

cellSeparator[source]#

appendCells[source]#

setCellSeparator(value)[source]#

Sets the string used to join cell values in a row when assembling textual output.

Parameters:

valuestr: Delimiter used to concatenate cell values.

getCellSeparator()[source]#

Gets the string used to join cell values in a row when assembling textual output.

Returns:

str: Delimiter used to concatenate cell values.

setAppendCells(value)[source]#

Sets whether to append all rows into a single content block.

Parameters:

valuebool: True to merge rows into one block, False for individual elements.

getAppendCells()[source]#

Gets whether to append all rows into a single content block.

Returns:

bool: True to merge rows into one block, False for individual elements.

class HasHTMLReaderProperties[source]#

Components that take parameters. This also provides an internal param map to store parameter values attached to the instance.

New in version 1.3.0.

timeout[source]#

setTimeout(value)[source]#

Sets the timeout (in seconds) for reading remote HTML resources.

Parameters:

valueint: Timeout in seconds for remote content retrieval.

getTimeout()[source]#

Gets the timeout value for reading remote HTML resources.

Returns:

int: Timeout in seconds.

setHeaders(headers: Dict[str, str])[source]#

class HasPowerPointProperties[source]#

Components that take parameters. This also provides an internal param map to store parameter values attached to the instance.

New in version 1.3.0.

includeSlideNotes[source]#

setIncludeSlideNotes(value)[source]#

Sets whether to extract speaker notes from slides.

Parameters:

valuebool: If True, notes are included as narrative text elements.

getIncludeSlideNotes()[source]#

Gets whether to extract speaker notes from slides.

Returns:

bool: True if notes are included as narrative text elements.

class HasTextReaderProperties[source]#

Components that take parameters. This also provides an internal param map to store parameter values attached to the instance.

New in version 1.3.0.

titleLengthSize[source]#

groupBrokenParagraphs[source]#

paragraphSplit[source]#

shortLineWordThreshold[source]#

maxLineCount[source]#

threshold[source]#

setTitleLengthSize(value)[source]#

getTitleLengthSize()[source]#

setGroupBrokenParagraphs(value)[source]#

getGroupBrokenParagraphs()[source]#

setParagraphSplit(value)[source]#

getParagraphSplit()[source]#

setShortLineWordThreshold(value)[source]#

getShortLineWordThreshold()[source]#

setMaxLineCount(value)[source]#

getMaxLineCount()[source]#

setThreshold(value)[source]#

getThreshold()[source]#

class HasChunkerProperties[source]#

Components that take parameters. This also provides an internal param map to store parameter values attached to the instance.

New in version 1.3.0.

chunkingStrategy[source]#

maxCharacters[source]#

newAfterNChars[source]#

overlap[source]#

combineTextUnderNChars[source]#

overlapAll[source]#

setChunkingStrategy(value)[source]#

setMaxCharacters(value)[source]#

setNewAfterNChars(value)[source]#

setOverlap(value)[source]#

setCombineTextUnderNChars(value)[source]#

setOverlapAll(value)[source]#

sparknlp.partition.partition_properties#

Module Contents#

Classes#

`sparknlp.partition.partition_properties`#