"Audio" (Net Encoder)
NetEncoder["Audio"]
represents an encoder that converts an audio file or object into a tensor of audio samples.
NetEncoder[{"Audio","param"->val,…}]
represents an encoder with specific parameters for preprocessing.
Details
- The "Audio" encoder returns the waveform of the signal. All the information that was in the original signal is present in the waveform.
- NetEncoder[…][input] applies the encoder to an input to produce an output.
- NetEncoder[…][{input1,input2,…}] applies the encoder to a list of inputs to produce a list of outputs.
- The input to the encoder can be an Audio object or a File[…] expression.
- The output of the encoder is a matrix of size n×1, where n is the number of audio samples after the preprocessing is applied.
- An encoder can be attached to an input port of a net by specifying "port"->NetEncoder[…] when constructing the net.
- The following general parameters are supported:
-
"Augmentation" None augmentation to be applied "Normalization" None whether to apply normalization "SampleRate" 16000 target sample rate "TargetLength" All target output length - The following settings and suboptions can be specified for each encoder parameter.
- "Normalization" can take the following settings:
-
None no normalization "Max" absolute maximum value normalized to 1 {"Max",val} absolute maximum value normalized to val {"RMS",val} RMS of input audio signal normalized to val - "TargetLength" can take the following settings:
-
All same as input signal dur the duration dur specified as a time quantity n the first n samples - If the specified "TargetLength" does not match the length of the input signal, padding or trimming are applied accordingly.
- "Augmentation" can be specified as a list of rules with the following keys:
-
"Convolution" None convolves an impulse response to the input "Noise" None adds noise to the input "TimeShift" None shifts the input by a specified amount "Volume" None multiplies the input with a constant - Any augmentation parameter that accepts a numeric value can also be specified as a list of two numbers or a univariate distribution. In the first case, the value will be randomized according to a uniform distribution between the given bounds. In the second, the user-provided distribution will be used.
- Possible values for "Convolution" include:
-
None no augmentation signal File or Audio object to be convolved with input {mix,signal} signal to be convolved with input and mix parameter - Possible values for "Noise" include:
-
None no augmentation amp white noise with amplitude amp noise File or Audio object containing the noise signal to be added {amp,noise} - noise signal and its with the specified amplitude
- Use "TimeShift"->t to shift the input by t seconds, padding or trimming if necessary. Use Scaled[s] to shift the input by s×dur seconds, where dur is the duration of the input signal. Use {t1,t2} or Scaled[{ts1,t2}] to randomize the shift between the specified times.
- Use "Volume"->val to specify a constant multiplier.
Parameters
Examples
open allclose allBasic Examples (2)
Create an audio NetEncoder:
Create an Audio object with three samples:
Apply the encoder to the Audio object:
Scope (3)
NetEncoder["Audio"] can encode either File or Audio objects. Create an audio encoder:
Apply the encoder to a File object:
Apply the encoder to an in-core Audio object:
Apply the encoder to an out-of-core Audio object:
Create a list of Audio objects:
NetEncoder["Audio"] maps across a batch of inputs:
Create an audio NetEncoder:
Attach the encoder to the input of a net:
Apply the net to an Audio object:
Parameters (3)
"Normalization" (1)
"SampleRate" (1)
Create an Audio object with three samples with a sample rate of 16000:
An encoder with a lower sample rate than the original audio will result in fewer samples:
An encoder with a higher sample rate than the original audio will result in more samples: