Description
Official pretrained Whisper model, adapted from HuggingFace transformer and curated to provide scalability and production-readiness using Spark NLP.
This is a multilingual model and supports the following languages:
Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.
Predicted Entities
How to use
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline
audioAssembler = AudioAssembler() \
.setInputCol("audio_content") \
.setOutputCol("audio_assembler")
speechToText = WhisperForCTC.pretrained("asr_whisper_tiny", "xx") \
.setInputCols(["audio_assembler"]) \
.setOutputCol("text")
pipeline = Pipeline().setStages([audioAssembler, speechToText])
processedAudioFloats = spark.createDataFrame([[rawFloats]]).toDF("audio_content")
result = pipeline.fit(processedAudioFloats).transform(processedAudioFloats)
result.select("text.result").show(truncate = False)
import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotators._
import com.johnsnowlabs.nlp.annotators.audio.WhisperForCTC
import org.apache.spark.ml.Pipeline
val audioAssembler: AudioAssembler = new AudioAssembler()
.setInputCol("audio_content")
.setOutputCol("audio_assembler")
val speechToText: WhisperForCTC = WhisperForCTC
.pretrained("asr_whisper_tiny", "xx")
.setInputCols("audio_assembler")
.setOutputCol("text")
val pipeline: Pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText))
val bufferedSource =
scala.io.Source.fromFile("src/test/resources/audio/txt/librispeech_asr_0.txt")
val rawFloats = bufferedSource
.getLines()
.map(_.split(",").head.trim.toFloat)
.toArray
bufferedSource.close
val processedAudioFloats = Seq(rawFloats).toDF("audio_content")
val result = pipeline.fit(processedAudioFloats).transform(processedAudioFloats)
result.select("text.result").show(truncate = false)
Model Information
Model Name: | asr_whisper_tiny |
Compatibility: | Spark NLP 5.1.0+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [audio_assembler] |
Output Labels: | [text] |
Language: | xx |
Size: | 156.6 MB |