Description
Hubert Model with a language modeling head on top for Connectionist Temporal Classification (CTC). Hubert was proposed in HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
The large model fine-tuned on 960h of Librispeech on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
How to use
audio_assembler = AudioAssembler()\
.setInputCol("audio_content")\
.setOutputCol("audio_assembler")
speech_to_text = HubertForCTC.pretrained("asr_hubert_large_ls960", "en")\
.setInputCols("audio_assembler")\
.setOutputCol("text")
pipeline = Pipeline(stages=[
audio_assembler,
speech_to_text,
])
pipelineModel = pipeline.fit(audioDf)
pipelineDF = pipelineModel.transform(audioDf)
val audioAssembler = new AudioAssembler()
.setInputCol("audio_content")
.setOutputCol("audio_assembler")
val speechToText = HubertForCTC
.pretrained("asr_hubert_large_ls960", "en")
.setInputCols("audio_assembler")
.setOutputCol("text")
val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText))
val pipelineModel = pipeline.fit(audioDf)
val pipelineDF = pipelineModel.transform(audioDf)
import nlu
nlu.load("en.speech2text.hubert").predict("""audio_content|||"audio_assembler|||"asr_hubert_large_ls960|||"en|||"audio_assembler|||"text""")
Model Information
Model Name: | asr_hubert_large_ls960 |
Compatibility: | Spark NLP 4.3.0+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [audio_assembler] |
Output Labels: | [text] |
Language: | en |
Size: | 1.5 GB |