class WhisperForCTC extends AnnotatorModel[WhisperForCTC] with HasBatchedAnnotateAudio[WhisperForCTC] with HasAudioFeatureProperties with WriteTensorflowModel with WriteOpenvinoModel with WriteOnnxModel with HasEngine with HasGeneratorProperties with HasProtectedParams

Whisper Model with a language modeling head on top for Connectionist Temporal Classification (CTC).

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. It transcribe in multiple languages, as well as translate from those languages into English.

The audio needs to be provided pre-processed an array of floats.

For multilingual models, the language and the task (transcribe or translate) can be set with setLanguage and setTask.

Note that at the moment, this annotator only supports greedy search and only Spark Versions 3.4 and up are supported.

Pretrained models can be loaded with pretrained of the companion object:

val speechToText = WhisperForCTC.pretrained()
  .setInputCols("audio_assembler")
  .setOutputCol("text")

The default model is "asr_whisper_tiny_opt", if no name is provided.

For available pretrained models please see the Models Hub.

To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see WhisperForCTCTestSpec.

References:

Robust Speech Recognition via Large-Scale Weak Supervision

Paper Abstract:

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zero- shot transfer setting without the need for any fine- tuning. When compared to humans, the models approach their accuracy and robustness. We are releasing models and inference code to serve as a foundation for further work on robust speech processing.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotators._
import com.johnsnowlabs.nlp.annotators.audio.WhisperForCTC
import org.apache.spark.ml.Pipeline

val audioAssembler: AudioAssembler = new AudioAssembler()
  .setInputCol("audio_content")
  .setOutputCol("audio_assembler")

val speechToText: WhisperForCTC = WhisperForCTC
  .pretrained()
  .setInputCols("audio_assembler")
  .setOutputCol("text")

val pipeline: Pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText))

val bufferedSource =
  scala.io.Source.fromFile("src/test/resources/audio/txt/librispeech_asr_0.txt")

val rawFloats = bufferedSource
  .getLines()
  .map(_.split(",").head.trim.toFloat)
  .toArray
bufferedSource.close

val processedAudioFloats = Seq(rawFloats).toDF("audio_content")

val result = pipeline.fit(processedAudioFloats).transform(processedAudioFloats)
result.select("text.result").show(truncate = false)
+------------------------------------------------------------------------------------------+
|result                                                                                    |
+------------------------------------------------------------------------------------------+
|[ Mr. Quilter is the apostle of the middle classes and we are glad to welcome his gospel.]|
+------------------------------------------------------------------------------------------+
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. WhisperForCTC
  2. HasProtectedParams
  3. HasGeneratorProperties
  4. HasEngine
  5. WriteOnnxModel
  6. WriteOpenvinoModel
  7. WriteTensorflowModel
  8. HasAudioFeatureProperties
  9. HasBatchedAnnotateAudio
  10. AnnotatorModel
  11. CanBeLazy
  12. RawAnnotator
  13. HasOutputAnnotationCol
  14. HasInputAnnotationCols
  15. HasOutputAnnotatorType
  16. ParamsAndFeaturesWritable
  17. HasFeatures
  18. DefaultParamsWritable
  19. MLWritable
  20. Model
  21. Transformer
  22. PipelineStage
  23. Logging
  24. Params
  25. Serializable
  26. Serializable
  27. Identifiable
  28. AnyRef
  29. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Parameters

A list of (hyper-)parameter keys this annotator can take. Users can set and get the parameter values through setters and getters, respectively.

  1. val batchSize: IntParam

    Size of every batch (Default depends on model).

    Size of every batch (Default depends on model).

    Definition Classes
    HasBatchedAnnotateAudio
  2. val beamSize: IntParam

    Beam size for the beam search algorithm (Default: 4)

    Beam size for the beam search algorithm (Default: 4)

    Definition Classes
    HasGeneratorProperties
  3. val configProtoBytes: IntArrayParam

    ConfigProto from tensorflow, serialized into byte array.

    ConfigProto from tensorflow, serialized into byte array. Get with config_proto.SerializeToString()

  4. val doNormalize: BooleanParam

    Whether or not to normalize the input with mean and standard deviation

    Whether or not to normalize the input with mean and standard deviation

    Definition Classes
    HasAudioFeatureProperties
  5. val doSample: BooleanParam

    Whether or not to use sampling, use greedy decoding otherwise (Default: false)

    Whether or not to use sampling, use greedy decoding otherwise (Default: false)

    Definition Classes
    HasGeneratorProperties
  6. val engine: Param[String]

    This param is set internally once via loadSavedModel.

    This param is set internally once via loadSavedModel. That's why there is no setter

    Definition Classes
    HasEngine
  7. val featureSize: IntParam

    Definition Classes
    HasAudioFeatureProperties
  8. val isMultilingual: ProtectedParam[Boolean]

    Whether or not the model is multilingual.

  9. val language: Param[String]

    Optional language to set for the transcription.

    Optional language to set for the transcription. The imported model needs to support multiple languages.

  10. val maxInputLength: IntParam

    max length of the input sequence (Default: 0)

    max length of the input sequence (Default: 0)

    Definition Classes
    HasGeneratorProperties
  11. val maxOutputLength: IntParam

    Maximum length of the sequence to be generated (Default: 20)

    Maximum length of the sequence to be generated (Default: 20)

    Definition Classes
    HasGeneratorProperties
  12. val minOutputLength: IntParam

    Minimum length of the sequence to be generated (Default: 0)

    Minimum length of the sequence to be generated (Default: 0)

    Definition Classes
    HasGeneratorProperties
  13. val nReturnSequences: IntParam

    The number of sequences to return from the beam search.

    The number of sequences to return from the beam search.

    Definition Classes
    HasGeneratorProperties
  14. val noRepeatNgramSize: IntParam

    If set to int > 0, all ngrams of that size can only occur once (Default: 0)

    If set to int > 0, all ngrams of that size can only occur once (Default: 0)

    Definition Classes
    HasGeneratorProperties
  15. val paddingSide: Param[String]

    Definition Classes
    HasAudioFeatureProperties
  16. val paddingValue: FloatParam

    Definition Classes
    HasAudioFeatureProperties
  17. val randomSeed: Option[Long]

    Optional Random seed for the model.

    Optional Random seed for the model. Needs to be of type Int.

    Definition Classes
    HasGeneratorProperties
  18. val repetitionPenalty: DoubleParam

    The parameter for repetition penalty (Default: 1.0).

    The parameter for repetition penalty (Default: 1.0). 1.0 means no penalty. See this paper for more details.

    Definition Classes
    HasGeneratorProperties
  19. val returnAttentionMask: BooleanParam

    Definition Classes
    HasAudioFeatureProperties
  20. val samplingRate: IntParam

    Definition Classes
    HasAudioFeatureProperties
  21. val signatures: MapFeature[AnnotatorType, AnnotatorType]

    It contains TF model signatures for the loaded saved model

  22. val stopTokenIds: IntArrayParam

    Stop tokens to terminate the generation

    Stop tokens to terminate the generation

    Definition Classes
    HasGeneratorProperties
  23. val task: Param[String]

    Set transformer task, e.g.

    Set transformer task, e.g. "summarize:" (Default: "").

    Definition Classes
    HasGeneratorProperties
  24. val temperature: DoubleParam

    The value used to module the next token probabilities (Default: 1.0)

    The value used to module the next token probabilities (Default: 1.0)

    Definition Classes
    HasGeneratorProperties
  25. val topK: IntParam

    The number of highest probability vocabulary tokens to keep for top-k-filtering (Default: 50)

    The number of highest probability vocabulary tokens to keep for top-k-filtering (Default: 50)

    Definition Classes
    HasGeneratorProperties
  26. val topP: DoubleParam

    If set to float < 1.0, only the most probable tokens with probabilities that add up to topP or higher are kept for generation (Default: 1.0)

    If set to float < 1.0, only the most probable tokens with probabilities that add up to topP or higher are kept for generation (Default: 1.0)

    Definition Classes
    HasGeneratorProperties

Members

  1. implicit class ProtectedParam[T] extends Param[T]
    Definition Classes
    HasProtectedParams
  2. type AnnotatorType = String
    Definition Classes
    HasOutputAnnotatorType
  1. def batchAnnotate(batchedAnnotations: Seq[Array[AnnotationAudio]]): Seq[Seq[Annotation]]

    Takes audio annotations and produces transcribed document annotations.

    Takes audio annotations and produces transcribed document annotations.

    batchedAnnotations

    Audio annotations in batches

    returns

    Transcribed audio as DOCUMENT type annotation

    Definition Classes
    WhisperForCTCHasBatchedAnnotateAudio
  2. def batchProcess(rows: Iterator[_]): Iterator[Row]
    Definition Classes
    HasBatchedAnnotateAudio
  3. final def clear(param: Param[_]): WhisperForCTC.this.type
    Definition Classes
    Params
  4. def copy(extra: ParamMap): WhisperForCTC

    requirement for annotators copies

    requirement for annotators copies

    Definition Classes
    RawAnnotator → Model → Transformer → PipelineStage → Params
  5. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  6. def explainParams(): String
    Definition Classes
    Params
  7. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  8. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  9. val features: ArrayBuffer[Feature[_, _, _]]
    Definition Classes
    HasFeatures
  10. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  11. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  12. def getInputCols: Array[String]

    returns

    input annotations columns currently used

    Definition Classes
    HasInputAnnotationCols
  13. def getLazyAnnotator: Boolean
    Definition Classes
    CanBeLazy
  14. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  15. final def getOutputCol: String

    Gets annotation column name going to generate

    Gets annotation column name going to generate

    Definition Classes
    HasOutputAnnotationCol
  16. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  17. def getVocabulary: Map[String, Int]
  18. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  19. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  20. def hasParent: Boolean
    Definition Classes
    Model
  21. val inputAnnotatorTypes: Array[AnnotatorType]

    Annotator reference id.

    Annotator reference id. Used to identify elements in metadata or to refer to this annotator type

    Definition Classes
    WhisperForCTCHasInputAnnotationCols
  22. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  23. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  24. val lazyAnnotator: BooleanParam
    Definition Classes
    CanBeLazy
  25. def onWrite(path: String, spark: SparkSession): Unit
  26. val optionalInputAnnotatorTypes: Array[String]
    Definition Classes
    HasInputAnnotationCols
  27. val outputAnnotatorType: AnnotatorType
    Definition Classes
    WhisperForCTCHasOutputAnnotatorType
  28. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  29. var parent: Estimator[WhisperForCTC]
    Definition Classes
    Model
  30. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  31. def set[T](param: ProtectedParam[T], value: T): WhisperForCTC.this.type

    Sets the value for a protected Param.

    Sets the value for a protected Param.

    If the parameter was already set, it will not be set again. Default values do not count as a set value and can be overridden.

    T

    Type of the parameter

    param

    Protected parameter to set

    value

    Value for the parameter

    returns

    This object

    Definition Classes
    HasProtectedParams
  32. final def set[T](param: Param[T], value: T): WhisperForCTC.this.type
    Definition Classes
    Params
  33. final def setInputCols(value: String*): WhisperForCTC.this.type
    Definition Classes
    HasInputAnnotationCols
  34. def setInputCols(value: Array[String]): WhisperForCTC.this.type

    Overrides required annotators column if different than default

    Overrides required annotators column if different than default

    Definition Classes
    HasInputAnnotationCols
  35. def setLazyAnnotator(value: Boolean): WhisperForCTC.this.type
    Definition Classes
    CanBeLazy
  36. def setMaxInputLength(value: Int): WhisperForCTC.this.type
    Definition Classes
    HasGeneratorProperties
  37. final def setOutputCol(value: String): WhisperForCTC.this.type

    Overrides annotation column name when transforming

    Overrides annotation column name when transforming

    Definition Classes
    HasOutputAnnotationCol
  38. def setParent(parent: Estimator[WhisperForCTC]): WhisperForCTC
    Definition Classes
    Model
  39. def setVocabulary(value: Map[String, Int]): WhisperForCTC.this.type
  40. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  41. final def transform(dataset: Dataset[_]): DataFrame

    Given requirements are met, this applies ML transformation within a Pipeline or stand-alone Output annotation will be generated as a new column, previous annotations are still available separately metadata is built at schema level to record annotations structural information outside its content

    Given requirements are met, this applies ML transformation within a Pipeline or stand-alone Output annotation will be generated as a new column, previous annotations are still available separately metadata is built at schema level to record annotations structural information outside its content

    dataset

    Dataset[Row]

    Definition Classes
    AnnotatorModel → Transformer
  42. def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" )
  43. def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" ) @varargs()
  44. final def transformSchema(schema: StructType): StructType

    requirement for pipeline transformation validation.

    requirement for pipeline transformation validation. It is called on fit()

    Definition Classes
    RawAnnotator → PipelineStage
  45. val uid: String
    Definition Classes
    WhisperForCTC → Identifiable
  46. def write: MLWriter
    Definition Classes
    ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable
  47. def writeOnnxModel(path: String, spark: SparkSession, onnxWrapper: OnnxWrapper, suffix: String, fileName: String): Unit
    Definition Classes
    WriteOnnxModel
  48. def writeOnnxModels(path: String, spark: SparkSession, onnxWrappersWithNames: Seq[(OnnxWrapper, String)], suffix: String): Unit
    Definition Classes
    WriteOnnxModel
  49. def writeOpenvinoModel(path: String, spark: SparkSession, openvinoWrapper: OpenvinoWrapper, suffix: String, fileName: String): Unit
    Definition Classes
    WriteOpenvinoModel
  50. def writeOpenvinoModels(path: String, spark: SparkSession, ovWrappersWithNames: Seq[(OpenvinoWrapper, String)], suffix: String): Unit
    Definition Classes
    WriteOpenvinoModel
  51. def writeTensorflowHub(path: String, tfPath: String, spark: SparkSession, suffix: String = "_use"): Unit
    Definition Classes
    WriteTensorflowModel
  52. def writeTensorflowModel(path: String, spark: SparkSession, tensorflow: TensorflowWrapper, suffix: String, filename: String, configProtoBytes: Option[Array[Byte]] = None): Unit
    Definition Classes
    WriteTensorflowModel
  53. def writeTensorflowModelV2(path: String, spark: SparkSession, tensorflow: TensorflowWrapper, suffix: String, filename: String, configProtoBytes: Option[Array[Byte]] = None, savedSignatures: Option[Map[String, String]] = None): Unit
    Definition Classes
    WriteTensorflowModel

Parameter setters

  1. def setBatchSize(size: Int): WhisperForCTC.this.type

    Size of every batch.

    Size of every batch.

    Definition Classes
    HasBatchedAnnotateAudio
  2. def setBeamSize(beamNum: Int): WhisperForCTC.this.type

    Definition Classes
    HasGeneratorProperties
  3. def setConfigProtoBytes(bytes: Array[Int]): WhisperForCTC.this.type

    ConfigProto from tensorflow, serialized into byte array.

    ConfigProto from tensorflow, serialized into byte array. Get with config_proto.SerializeToString()

  4. def setDoNormalize(value: Boolean): WhisperForCTC.this.type

    Definition Classes
    HasAudioFeatureProperties
  5. def setDoSample(value: Boolean): WhisperForCTC.this.type

    Definition Classes
    HasGeneratorProperties
  6. def setFeatureSize(value: Int): WhisperForCTC.this.type

    Definition Classes
    HasAudioFeatureProperties
  7. def setIsMultilingual(value: Boolean): WhisperForCTC.this.type

  8. def setLanguage(value: String): WhisperForCTC.this.type

    Sets the language for the audio, formatted to e.g.

    Sets the language for the audio, formatted to e.g. <|en|>. Check the model description for supported languages.

  9. def setMaxOutputLength(value: Int): WhisperForCTC.this.type

    Definition Classes
    HasGeneratorProperties
  10. def setMinOutputLength(value: Int): WhisperForCTC.this.type

    Definition Classes
    HasGeneratorProperties
  11. def setModelIfNotSet(spark: SparkSession, tensorflowWrapper: Option[TensorflowWrapper], onnxWrappers: Option[EncoderDecoderWrappers], openvinoWrapper: Option[EncoderDecoderWrappers]): WhisperForCTC.this.type

  12. def setNReturnSequences(beamNum: Int): WhisperForCTC.this.type

    Definition Classes
    HasGeneratorProperties
  13. def setNoRepeatNgramSize(value: Int): WhisperForCTC.this.type

    Definition Classes
    HasGeneratorProperties
  14. def setPaddingSide(value: String): WhisperForCTC.this.type

    Definition Classes
    HasAudioFeatureProperties
  15. def setPaddingValue(value: Float): WhisperForCTC.this.type

    Definition Classes
    HasAudioFeatureProperties
  16. def setRandomSeed(value: Long): WhisperForCTC.this.type

    Definition Classes
    HasGeneratorProperties
  17. def setRepetitionPenalty(value: Double): WhisperForCTC.this.type

    Definition Classes
    HasGeneratorProperties
  18. def setReturnAttentionMask(value: Boolean): WhisperForCTC.this.type

    Definition Classes
    HasAudioFeatureProperties
  19. def setSamplingRate(value: Int): WhisperForCTC.this.type

    Definition Classes
    HasAudioFeatureProperties
  20. def setSignatures(value: Map[String, String]): WhisperForCTC.this.type

  21. def setStopTokenIds(value: Array[Int]): WhisperForCTC.this.type

    Definition Classes
    HasGeneratorProperties
  22. def setTask(value: String): WhisperForCTC.this.type

    Sets the formatted task for the audio.

    Sets the formatted task for the audio. Either <|translate|> or <|transcribe|>.

    Only multilingual models can do translation.

    Definition Classes
    WhisperForCTCHasGeneratorProperties
  23. def setTemperature(value: Double): WhisperForCTC.this.type

    Definition Classes
    HasGeneratorProperties
  24. def setTopK(value: Int): WhisperForCTC.this.type

    Definition Classes
    HasGeneratorProperties
  25. def setTopP(value: Double): WhisperForCTC.this.type

    Definition Classes
    HasGeneratorProperties

Parameter getters

  1. def getBatchSize: Int

    Size of every batch.

    Size of every batch.

    Definition Classes
    HasBatchedAnnotateAudio
  2. def getBeamSize: Int

    Definition Classes
    HasGeneratorProperties
  3. def getConfigProtoBytes: Option[Array[Byte]]

    ConfigProto from tensorflow, serialized into byte array.

    ConfigProto from tensorflow, serialized into byte array. Get with config_proto.SerializeToString()

  4. def getDoNormalize: Boolean

    Definition Classes
    HasAudioFeatureProperties
  5. def getDoSample: Boolean

    Definition Classes
    HasGeneratorProperties
  6. def getEngine: String

    Definition Classes
    HasEngine
  7. def getFeatureSize: Int

    Definition Classes
    HasAudioFeatureProperties
  8. def getIsMultilingual: Boolean

  9. def getLanguage: Option[String]

  10. def getMaxOutputLength: Int

    Definition Classes
    HasGeneratorProperties
  11. def getMinOutputLength: Int

    Definition Classes
    HasGeneratorProperties
  12. def getModelIfNotSet: Whisper

  13. def getNReturnSequences: Int

    Definition Classes
    HasGeneratorProperties
  14. def getNoRepeatNgramSize: Int

    Definition Classes
    HasGeneratorProperties
  15. def getPaddingSide: String

    Definition Classes
    HasAudioFeatureProperties
  16. def getPaddingValue: Float

    Definition Classes
    HasAudioFeatureProperties
  17. def getRandomSeed: Option[Long]

    Definition Classes
    HasGeneratorProperties
  18. def getRepetitionPenalty: Double

    Definition Classes
    HasGeneratorProperties
  19. def getReturnAttentionMask: Boolean

    Definition Classes
    HasAudioFeatureProperties
  20. def getSamplingRate: Int

    Definition Classes
    HasAudioFeatureProperties
  21. def getSignatures: Option[Map[String, String]]

  22. def getStopTokenIds: Array[Int]

    Definition Classes
    HasGeneratorProperties
  23. def getTask: Option[String]

    Definition Classes
    HasGeneratorProperties
  24. def getTemperature: Double

    Definition Classes
    HasGeneratorProperties
  25. def getTopK: Int

    Definition Classes
    HasGeneratorProperties
  26. def getTopP: Double

    Definition Classes
    HasGeneratorProperties