English asr_wav2vec2_base_960h TFWav2Vec2ForCTC from facebook

Description

“

    Pretrained Wav2vec2 model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.asr_wav2vec2_base_960h_by_facebook is a English model originally trained by facebook.

Predicted Entities

Download Copy S3 URI

How to use

audio_assembler = AudioAssembler() \
    .setInputCol("audio_content") \
    .setOutputCol("audio_assembler")

speech_to_text = Wav2Vec2ForCTC \
    .pretrained("asr_wav2vec2_base_960h", "en")\
    .setInputCols("audio_assembler") \
    .setOutputCol("text")

val audioAssembler = new AudioAssembler()
    .setInputCol("audio_content") 
    .setOutputCol("audio_assembler")

val speechToText = Wav2Vec2ForCTC
    .pretrained("asr_wav2vec2_base_960h", "en")
    .setInputCols("audio_assembler") 
    .setOutputCol("text") 

val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText))

val pipelineModel = pipeline.fit(audioDf)

val pipelineDF = pipelineModel.transform(audioDf)

Model Information

Model Name:	asr_wav2vec2_base_960h
Compatibility:	Spark NLP 5.5.0+
License:	Open Source
Edition:	Official
Input Labels:	[audio_assembler]
Output Labels:	[text]
Language:	en
Size:	233.0 MB

References

https://huggingface.co/facebook/wav2vec2-base-960h

PREVIOUSASR HubertForCTC - asr_hubert_large_ls960

NEXTImage Zero Shot Classification with CLIP