sparknlp.base.audio_assembler#

Contains classes for the AudioAssembler.

Module Contents#

Classes#

AudioAssembler

Prepares Floats or Doubles from a processed audio file(s)

class AudioAssembler[source]#

Prepares Floats or Doubles from a processed audio file(s) This component is needed to process audio.

Input Annotation types

Output Annotation type

NONE

AUDIO

Parameters:
inputCol

Input column name

outputCol

Output column name

Examples

>>> import sparknlp
>>> from sparknlp.base import *
>>> from pyspark.ml import Pipeline
>>> data = spark.read.option("inferSchema", value = True)                    .parquet("./tmp/librispeech_asr_dummy_clean_audio_array_parquet")                    .select($"float_array".cast("array<float>").as("audio_content"))
>>> audioAssembler = AudioAssembler().setInputCol("audio_content").setOutputCol("audio_assembler")
>>> result = audioAssembler.transform(data)
>>> result.select("audio_assembler").show()
>>> result.select("audio_assembler").printSchema()
root
 |-- audio_content: array (nullable = true)
 |    |-- element: float (containsNull = true)
setInputCol(value)[source]#

Sets input column name.

Parameters:
valuestr

Name of the input column that has audio in format of Array[Float] or Array[Double]

setOutputCol(value)[source]#

Sets output column name.

Parameters:
valuestr

Name of the Output Column

getOutputCol()[source]#

Gets output column name of annotations.