English asr_wav2vec2_base_960h TFWav2Vec2ForCTC from facebook

Description

    Pretrained Wav2vec2 model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.asr_wav2vec2_base_960h_by_facebook is a English model originally trained by facebook.

Predicted Entities

Download Copy S3 URI

How to use

audio_assembler = AudioAssembler() \
    .setInputCol("audio_content") \
    .setOutputCol("audio_assembler")

speech_to_text = Wav2Vec2ForCTC \
    .pretrained("asr_wav2vec2_base_960h", "en")\
    .setInputCols("audio_assembler") \
    .setOutputCol("text")
val audioAssembler = new AudioAssembler()
    .setInputCol("audio_content") 
    .setOutputCol("audio_assembler")

val speechToText = Wav2Vec2ForCTC
    .pretrained("asr_wav2vec2_base_960h", "en")
    .setInputCols("audio_assembler") 
    .setOutputCol("text") 

val pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText))

val pipelineModel = pipeline.fit(audioDf)

val pipelineDF = pipelineModel.transform(audioDf)

Model Information

Model Name: asr_wav2vec2_base_960h
Compatibility: Spark NLP 5.5.0+
License: Open Source
Edition: Official
Input Labels: [audio_assembler]
Output Labels: [text]
Language: en
Size: 233.0 MB

References

https://huggingface.co/facebook/wav2vec2-base-960h