"Audio" (Net Encoder)

NetEncoder["Audio"]

represents an encoder that converts an audio file or object into a tensor of audio samples.

NetEncoder[{"Audio","param"->val,…}]

represents an encoder with specific parameters for preprocessing.

Details

The "Audio" encoder returns the waveform of the signal. All the information that was in the original signal is present in the waveform.
NetEncoder[…][input] applies the encoder to an input to produce an output.
NetEncoder[…][{input₁,input₂,…}] applies the encoder to a list of inputs to produce a list of outputs.
The input to the encoder can be an Audio object or a File[…] expression.
The output of the encoder is a matrix of size n×1, where n is the number of audio samples after the preprocessing is applied.
An encoder can be attached to an input port of a net by specifying "port"->NetEncoder[…] when constructing the net.

Parameters

The following general parameters are supported:

"Augmentation"	None	augmentation to be applied
"Normalization"	None	whether to apply normalization
"SampleRate"	16000	target sample rate
"TargetLength"	All	target output length

The following settings and suboptions can be specified for each encoder parameter.
"Normalization" can take the following settings:

	None	no normalization
	"Max"	absolute maximum value normalized to 1
	{"Max",val}	absolute maximum value normalized to val
	{"RMS",val}	RMS of input audio signal normalized to val

"TargetLength" can take the following settings:
All same as input signal

dur the duration dur specified as a time quantity

n the first n samples
If the specified "TargetLength" does not match the length of the input signal, padding or trimming are applied accordingly.
"Augmentation" can be specified as a list of rules with the following keys:

"Convolution"	None	convolves an impulse response to the input
"Noise"	None	adds noise to the input
"TimeShift"	None	shifts the input by a specified amount
"Volume"	None	multiplies the input with a constant

Any augmentation parameter that accepts a numeric value can also be specified as a list of two numbers or a univariate distribution. In the first case, the value will be randomized according to a uniform distribution between the given bounds. In the second, the user-provided distribution will be used.
Possible values for "Convolution" include:
None no augmentation

signal File or Audio object to be convolved with input

{mix,signal} signal to be convolved with input and mix parameter
Possible values for "Noise" include:

	None	no augmentation
	amp	white noise with amplitude amp
	noise	File or Audio object containing the noise signal to be added
	{amp,noise}	noise signal and its with the specified amplitude

Use "TimeShift"->t to shift the input by t seconds, padding or trimming if necessary. Use Scaled[s] to shift the input by s×dur seconds, where dur is the duration of the input signal. Use {t₁,t₂} or Scaled[{ts₁,t₂}] to randomize the shift between the specified times.
Use "Volume"->val to specify a constant multiplier.

Examples

open allclose all

Basic Examples (2)

Create an audio NetEncoder:

Create an Audio object with three samples:

Apply the encoder to the Audio object:

Plot the result of the encoder:

Scope (3)

NetEncoder["Audio"] can encode either File or Audio objects. Create an audio encoder:

Apply the encoder to a File object:

Apply the encoder to an in-core Audio object:

Apply the encoder to an out-of-core Audio object:

Create a list of Audio objects:

NetEncoder["Audio"] maps across a batch of inputs:

Create an audio NetEncoder:

Attach the encoder to the input of a net:

Apply the net to an Audio object:

Parameters (3)

"Normalization" (1)

Create an Audio object with three samples:

Use an encoder with "Normalization"->None to avoid any normalization:

Use an encoder with "Normalization"->Automatic to normalize the maximum absolute value to 1:

Find the minimum and maximum value of the result:

"SampleRate" (1)

Create an Audio object with three samples with a sample rate of 16000:

An encoder with a lower sample rate than the original audio will result in fewer samples:

An encoder with a higher sample rate than the original audio will result in more samples:

"TargetLength" (1)

Create an Audio object with three samples:

Using an encoder with "TargetLength"All returns all three samples:

Using an encoder with "TargetLength"->5 zero-pads the output to be of length 5:

Using an encoder with "TargetLength"2 takes only the first two samples:

Possible Issues (1)

If the input is a multi-channel signal, the mean of the channels is returned:

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

"Audio" (Net Encoder)

Details

Parameters

Examples

Basic Examples (2)

Scope (3)

Parameters (3)

"Normalization" (1)

"SampleRate" (1)

"TargetLength" (1)

Possible Issues (1)

	All	same as input signal
	dur	the duration dur specified as a time quantity
	n	the first n samples

	None	no augmentation
	signal	File or Audio object to be convolved with input
	{mix,signal}	signal to be convolved with input and mix parameter

"Audio" (Net Encoder)

Details

Parameters

Examples

Basic Examples (2)

Scope (3)

Parameters (3)

"Normalization" (1)

"SampleRate" (1)

"TargetLength" (1)

Possible Issues (1)

See Also

Tech Notes

Related Guides

History