ASR HubertForCTC - asr_hubert_large_ls960

Description

Hubert Model with a language modeling head on top for Connectionist Temporal Classification (CTC). Hubert was proposed in HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.

The large model fine-tuned on 960h of Librispeech on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.

Download Copy S3 URI

How to use

                
audio_assembler = AudioAssembler()\
  .setInputCol("audio_content")\
  .setOutputCol("audio_assembler")

speech_to_text = HubertForCTC.pretrained("asr_hubert_large_ls960", "en")\
  .setInputCols("audio_assembler")\
  .setOutputCol("text")

pipeline = Pipeline(stages=[
  audio_assembler,
  speech_to_text,
])

pipelineModel = pipeline.fit(audioDf)

pipelineDF = pipelineModel.transform(audioDf)

val audioAssembler = new AudioAssembler()
    .setInputCol("audio_content") 
    .setOutputCol("audio_assembler")

val speechToText = HubertForCTC
    .pretrained("asr_hubert_large_ls960", "en")
    .setInputCols("audio_assembler") 
    .setOutputCol("text") 

val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText))

val pipelineModel = pipeline.fit(audioDf)

val pipelineDF = pipelineModel.transform(audioDf)
import nlu
nlu.load("en.speech2text.hubert").predict("""audio_content|||"audio_assembler|||"asr_hubert_large_ls960|||"en|||"audio_assembler|||"text""")

Model Information

Model Name: asr_hubert_large_ls960
Compatibility: Spark NLP 4.3.0+
License: Open Source
Edition: Official
Input Labels: [audio_assembler]
Output Labels: [text]
Language: en
Size: 1.5 GB

References

https://huggingface.co/facebook/hubert-large-ls960-ft