Official whisper-tiny


Official pretrained Whisper model, adapted from HuggingFace transformer and curated to provide scalability and production-readiness using Spark NLP.

This is a multilingual model and supports the following languages:

Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.

Predicted Entities

Download Copy S3 URI

How to use

import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from import Pipeline

audioAssembler = AudioAssembler() \
    .setInputCol("audio_content") \

speechToText = WhisperForCTC.pretrained("asr_whisper_tiny", "xx") \
    .setInputCols(["audio_assembler"]) \

pipeline = Pipeline().setStages([audioAssembler, speechToText])
processedAudioFloats = spark.createDataFrame([[rawFloats]]).toDF("audio_content")
result ="text.result").show(truncate = False)
import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotators._

val audioAssembler: AudioAssembler = new AudioAssembler()

val speechToText: WhisperForCTC = WhisperForCTC
  .pretrained("asr_whisper_tiny", "xx") 

val pipeline: Pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText))

val bufferedSource ="src/test/resources/audio/txt/librispeech_asr_0.txt")

val rawFloats = bufferedSource

val processedAudioFloats = Seq(rawFloats).toDF("audio_content")

val result ="text.result").show(truncate = false)

Model Information

Model Name: asr_whisper_tiny
Compatibility: Spark NLP 5.1.0+
License: Open Source
Edition: Official
Input Labels: [audio_assembler]
Output Labels: [text]
Language: xx
Size: 156.6 MB