"Audio" (Net Encoder)

NetEncoder["Audio"]

represents an encoder that converts an audio file or object into a tensor of audio samples.

NetEncoder[{"Audio","param"->val,}]

represents an encoder with specific parameters for preprocessing.

Details

  • The "Audio" encoder returns the waveform of the signal. All the information that was in the original signal is present in the waveform.
  • NetEncoder[][input] applies the encoder to an input to produce an output.
  • NetEncoder[][{input1,input2,}] applies the encoder to a list of inputs to produce a list of outputs.
  • The input to the encoder can be an Audio object or a File[] expression.
  • The output of the encoder is a matrix of size n×1, where n is the number of audio samples after the preprocessing is applied.
  • An encoder can be attached to an input port of a net by specifying "port"->NetEncoder[] when constructing the net.
  • Parameters
  • The following general parameters are supported:
  • "Augmentation"Noneaugmentation to be applied
    "Normalization"Nonewhether to apply normalization
    "SampleRate"16000target sample rate
    "TargetLength"Alltarget output length
  • The following settings and suboptions can be specified for each encoder parameter.
  • "Normalization" can take the following settings:
  • Noneno normalization
    "Max"absolute maximum value normalized to 1
    {"Max",val}absolute maximum value normalized to val
    {"RMS",val}RMS of input audio signal normalized to val
  • "TargetLength" can take the following settings:
  • Allsame as input signal
    durthe duration dur specified as a time quantity
    nthe first n samples
  • If the specified "TargetLength" does not match the length of the input signal, padding or trimming are applied accordingly.
  • "Augmentation" can be specified as a list of rules with the following keys:
  • "Convolution"Noneconvolves an impulse response to the input
    "Noise"Noneadds noise to the input
    "TimeShift"Noneshifts the input by a specified amount
    "Volume"Nonemultiplies the input with a constant
  • Any augmentation parameter that accepts a numeric value can also be specified as a list of two numbers or a univariate distribution. In the first case, the value will be randomized according to a uniform distribution between the given bounds. In the second, the user-provided distribution will be used.
  • Possible values for "Convolution" include:
  • Noneno augmentation
    signalFile or Audio object to be convolved with input
    {mix,signal}signal to be convolved with input and mix parameter
  • Possible values for "Noise" include:
  • Noneno augmentation
    ampwhite noise with amplitude amp
    noiseFile or Audio object containing the noise signal to be added
    {amp,noise}
  • noise signal and its with the specified amplitude
  • Use "TimeShift"->t to shift the input by t seconds, padding or trimming if necessary. Use Scaled[s] to shift the input by s×dur seconds, where dur is the duration of the input signal. Use {t1,t2} or Scaled[{ts1,t2}] to randomize the shift between the specified times.
  • Use "Volume"->val to specify a constant multiplier.

Examples

open allclose all

Basic Examples  (2)

Create an audio NetEncoder:

Create an Audio object with three samples:

Apply the encoder to the Audio object:

Plot the result of the encoder:

Scope  (3)

NetEncoder["Audio"] can encode either File or Audio objects. Create an audio encoder:

Apply the encoder to a File object:

Apply the encoder to an in-core Audio object:

Apply the encoder to an out-of-core Audio object:

Create a list of Audio objects:

NetEncoder["Audio"] maps across a batch of inputs:

Create an audio NetEncoder:

Attach the encoder to the input of a net:

Apply the net to an Audio object:

Parameters  (3)

"Normalization"  (1)

Create an Audio object with three samples:

Use an encoder with "Normalization"->None to avoid any normalization:

Use an encoder with "Normalization"->Automatic to normalize the maximum absolute value to 1:

Find the minimum and maximum value of the result:

"SampleRate"  (1)

Create an Audio object with three samples with a sample rate of 16000:

An encoder with a lower sample rate than the original audio will result in fewer samples:

An encoder with a higher sample rate than the original audio will result in more samples:

"TargetLength"  (1)

Create an Audio object with three samples:

Using an encoder with "TargetLength"All returns all three samples:

Using an encoder with "TargetLength"->5 zero-pads the output to be of length 5:

Using an encoder with "TargetLength"2 takes only the first two samples:

Possible Issues  (1)

If the input is a multi-channel signal, the mean of the channels is returned: