sparknlp.common.properties#

Contains classes for Annotator properties.

Module Contents#

Classes#

HasEmbeddingsProperties

Components that take parameters. This also provides an internal

Functions#

setTask(self, value)

Sets the transformer's task, e.g. summarize:.

setMinOutputLength(self, value)

Sets minimum length of the sequence to be generated.

setMaxOutputLength(self, value)

Sets maximum length of output text.

setDoSample(self, value)

Sets whether or not to use sampling, use greedy decoding otherwise.

setTemperature(self, value)

Sets the value used to module the next token probabilities.

setTopK(self, value)

Sets the number of highest probability vocabulary tokens to keep for

setTopP(self, value)

Sets the top cumulative probability for vocabulary tokens.

setRepetitionPenalty(self, value)

Sets the parameter for repetition penalty. 1.0 means no penalty.

setNoRepeatNgramSize(self, value)

Sets size of n-grams that can only occur once.

setBeamSize(self, value)

Sets the number of beam size for beam search.

setNReturnSequences(self, value)

Sets the number of sequences to return from the beam search.

class HasEmbeddingsProperties[source]#

Components that take parameters. This also provides an internal param map to store parameter values attached to the instance.

New in version 1.3.0.

setDimension(value)[source]#

Sets embeddings dimension.

Parameters:
valueint

Embeddings dimension

getDimension()[source]#

Gets embeddings dimension.

setTask(self, value)[source]#

Sets the transformer’s task, e.g. summarize:.

Parameters:
valuestr

The transformer’s task

setMinOutputLength(self, value)[source]#

Sets minimum length of the sequence to be generated.

Parameters:
valueint

Minimum length of the sequence to be generated

setMaxOutputLength(self, value)[source]#

Sets maximum length of output text.

Parameters:
valueint

Maximum length of output text

setDoSample(self, value)[source]#

Sets whether or not to use sampling, use greedy decoding otherwise.

Parameters:
valuebool

Whether or not to use sampling; use greedy decoding otherwise

setTemperature(self, value)[source]#

Sets the value used to module the next token probabilities.

Parameters:
valuefloat

The value used to module the next token probabilities

setTopK(self, value)[source]#

Sets the number of highest probability vocabulary tokens to keep for top-k-filtering.

Parameters:
valueint

Number of highest probability vocabulary tokens to keep

setTopP(self, value)[source]#

Sets the top cumulative probability for vocabulary tokens.

If set to float < 1, only the most probable tokens with probabilities that add up to topP or higher are kept for generation.

Parameters:
valuefloat

Cumulative probability for vocabulary tokens

setRepetitionPenalty(self, value)[source]#

Sets the parameter for repetition penalty. 1.0 means no penalty.

Parameters:
valuefloat

The repetition penalty

References

See Ctrl: A Conditional Transformer Language Model For Controllable Generation for more details.

setNoRepeatNgramSize(self, value)[source]#

Sets size of n-grams that can only occur once.

If set to int > 0, all ngrams of that size can only occur once.

Parameters:
valueint

N-gram size can only occur once

setBeamSize(self, value)[source]#

Sets the number of beam size for beam search.

Parameters:
valueint

Number of beam size for beam search

setNReturnSequences(self, value)[source]#

Sets the number of sequences to return from the beam search.

Parameters:
valueint

Number of sequences to return