sparknlp.annotator.param.evaluation_dl_params#

Module Contents#

Classes#

EvaluationDLParams

Components that take parameters. This also provides an internal

class EvaluationDLParams[source]#

Components that take parameters. This also provides an internal param map to store parameter values attached to the instance.

New in version 1.3.0.

setVerbose(value)[source]#

Sets level of verbosity during training

Parameters:
valueint

Level of verbosity

setValidationSplit(v)[source]#

Sets the proportion of training dataset to be validated against the model on each Epoch, by default it is 0.0 and off. The value should be between 0.0 and 1.0.

Parameters:
vfloat

Proportion of training dataset to be validated

setEvaluationLogExtended(v)[source]#

Sets whether logs for validation to be extended, by default False. Displays time and evaluation of each label.

Parameters:
vbool

Whether logs for validation to be extended

setEnableOutputLogs(value)[source]#

Sets whether to use stdout in addition to Spark logs, by default False.

Parameters:
valuebool

Whether to use stdout in addition to Spark logs

setOutputLogsPath(p)[source]#

Sets folder path to save training logs

Parameters:
pstr

Folder path to save training logs

setTestDataset(path, read_as=ReadAs.SPARK, options={'format': 'parquet'})[source]#

Path to a parquet file of a test dataset. If set, it is used to calculate statistics on it during training.

The parquet file must be a dataframe that has the same columns as the model that is being trained. For example, if the model needs as input DOCUMENT, TOKEN, WORD_EMBEDDINGS (Features) and NAMED_ENTITY (label) then these columns also need to be present while saving the dataframe. The pre-processing steps for the training dataframe should also be applied to the test dataframe.

An example on how to create such a parquet file could be:

>>> # assuming preProcessingPipeline
>>> (train, test) = data.randomSplit([0.8, 0.2])
>>> preProcessingPipeline
...     .fit(test)
...     .transform(test)
...     .write
...     .mode("overwrite")
...     .parquet("test_data")
>>> annotator.setTestDataset("test_data")
Parameters:
pathstr

Path to test dataset

read_asstr, optional

How to read the resource, by default ReadAs.SPARK

optionsdict, optional

Options for reading the resource, by default {“format”: “csv”}