Description
“
Pretrained Wav2vec2 model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.asr_wav2vec2_base_960h_by_facebook is a English model originally trained by facebook.
Predicted Entities
How to use
audio_assembler = AudioAssembler() \
.setInputCol("audio_content") \
.setOutputCol("audio_assembler")
speech_to_text = Wav2Vec2ForCTC \
.pretrained("asr_wav2vec2_base_960h", "en")\
.setInputCols("audio_assembler") \
.setOutputCol("text")
val audioAssembler = new AudioAssembler()
.setInputCol("audio_content")
.setOutputCol("audio_assembler")
val speechToText = Wav2Vec2ForCTC
.pretrained("asr_wav2vec2_base_960h", "en")
.setInputCols("audio_assembler")
.setOutputCol("text")
val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText))
val pipelineModel = pipeline.fit(audioDf)
val pipelineDF = pipelineModel.transform(audioDf)
Model Information
Model Name: | asr_wav2vec2_base_960h |
Compatibility: | Spark NLP 5.5.0+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [audio_assembler] |
Output Labels: | [text] |
Language: | en |
Size: | 233.0 MB |
References
https://huggingface.co/facebook/wav2vec2-base-960h