"Audio" (Net Encoder)


represents an encoder that converts an audio file or object into a tensor of audio samples.


represents an encoder with specific parameters for preprocessing.


  • The "Audio" encoder returns the waveform of the signal. All the information that was in the original signal is present in the waveform.
  • NetEncoder[][input] applies the encoder to an input to produce an output.
  • NetEncoder[][{input1,input2,}] applies the encoder to a list of inputs to produce a list of outputs.
  • The input to the encoder can be an Audio object or a File[] expression.
  • The output of the encoder is a matrix of size n×1, where n is the number of audio samples after the preprocessing is applied.
  • An encoder can be attached to an input port of a net by specifying "port"->NetEncoder[] when constructing the net.
  • Parameters
  • The following general parameters are supported:
  • "Augmentation"Noneaugmentation to be applied
    "Normalization"Nonewhether to apply normalization
    "SampleRate"16000target sample rate
    "TargetLength"Alltarget output length
  • The following settings and suboptions can be specified for each encoder parameter.
  • "Normalization" can take the following settings:
  • Noneno normalization
    "Max"absolute maximum value normalized to 1
    {"Max",val}absolute maximum value normalized to val
    {"RMS",val}RMS of input audio signal normalized to val
  • "TargetLength" can take the following settings:
  • Allsame as input signal
    durthe duration dur specified as a time quantity
    nthe first n samples
  • If the specified "TargetLength" does not match the length of the input signal, padding or trimming are applied accordingly.
  • "Augmentation" can be specified as a list of rules with the following keys:
  • "Convolution"Noneconvolves an impulse response to the input
    "Noise"Noneadds noise to the input
    "TimeShift"Noneshifts the input by a specified amount
    "Volume"Nonemultiplies the input with a constant
  • Any augmentation parameter that accepts a numeric value can also be specified as a list of two numbers or a univariate distribution. In the first case, the value will be randomized according to a uniform distribution between the given bounds. In the second, the user-provided distribution will be used.
  • Possible values for "Convolution" include:
  • Noneno augmentation
    signalFile or Audio object to be convolved with input
    {mix,signal}signal to be convolved with input and mix parameter
  • Possible values for "Noise" include:
  • Noneno augmentation
    ampwhite noise with amplitude amp
    noiseFile or Audio object containing the noise signal to be added
  • noise signal and its with the specified amplitude
  • Use "TimeShift"->t to shift the input by t seconds, padding or trimming if necessary. Use Scaled[s] to shift the input by s×dur seconds, where dur is the duration of the input signal. Use {t1,t2} or Scaled[{ts1,t2}] to randomize the shift between the specified times.
  • Use "Volume"->val to specify a constant multiplier.


open all close all

Basic Examples  (2)

Create an audio NetEncoder:

Click for copyable input

Create an Audio object with three samples:

Click for copyable input

Apply the encoder to the Audio object:

Click for copyable input

Plot the result of the encoder:

Click for copyable input

Scope  (3)

Parameters  (3)

Possible Issues  (1)