Description
“ Hubert Model with a language modeling head on top for Connectionist Temporal Classification (CTC). Hubert was proposed in HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
The large model fine-tuned on 960h of Librispeech on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
Predicted Entities
How to use
audio_assembler = AudioAssembler()\
.setInputCol("audio_content")\
.setOutputCol("audio_assembler")
speech_to_text = HubertForCTC.pretrained("asr_hubert_large_ls960", "en") .setInputCols("audio_assembler")\
.setOutputCol("text")
pipeline = Pipeline(stages=[
audio_assembler,
speech_to_text,
])
pipelineModel = pipeline.fit(audioDf)
pipelineDF = pipelineModel.transform(audioDf)
val audioAssembler = new AudioAssembler()
.setInputCol("audio_content")
.setOutputCol("audio_assembler")
val speechToText = HubertForCTC
.pretrained("asr_hubert_large_ls960", "en")
.setInputCols("audio_assembler")
.setOutputCol("text")
val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText))
val pipelineModel = pipeline.fit(audioDf)
val pipelineDF = pipelineModel.transform(audioDf)
Model Information
Model Name: | asr_hubert_large_ls960 |
Compatibility: | Spark NLP 5.5.0+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [audio_assembler] |
Output Labels: | [text] |
Language: | en |
Size: | 1.5 GB |
References
https://huggingface.co/facebook/hubert-large-ls960-ft