"AudioMelSpectrogram" (Net Encoder)
NetEncoder["AudioMelSpectrogram"]
represents an encoder that converts an audio file or object into its mel-frequency spectrogram.
NetEncoder[{"AudioMelSpectrogram","param"->val,…}]
represents an encoder with specific parameters for preprocessing and feature computation.
Details
- NetEncoder[…][input] applies the encoder to an input to produce an output.
- NetEncoder[…][{input1,input2,…}] applies the encoder to a list of inputs to produce a list of outputs.
- The input to the encoder can be an Audio object or a File[…] expression.
- The output is computed by filtering the spectrogram with nf bandpass filters whose center frequencies are linearly spaces on the mel scale.
- The output of the encoder is a rank-2 tensor of dimensions {n,nf}, where n is the number of partitions after the preprocessing is applied and nf is the number of filters used for the computation.
- An encoder can be attached to an input port of a net by specifying "port"->NetEncoder[…] when constructing the net.
- The following parameters are supported:
-
"Normalization" None whether to apply normalization "SampleRate" 16000 target sample rate "TargetLength" All target output length "WindowSize" Automatic length of the partitions "Offset" Automatic offset of the partitions "MinimumFrequency" Automatic minimum frequency of the mel filters "MaximumFrequency" Automatic maximum frequency of the mel filters "NumberOfFilters" 40 number of the mel filters - With the parameter "Normalization"None, no normalization is applied.
- With the parameter "Normalization"Automatic, the signal is normalized to the maximum absolute value. The normalization is applied to the sample values before the short-time Fourier transform is computed.
- With the parameter "TargetLength"->All, the output of the encoder includes all available audio samples from the input audio.
- With the parameter "TargetLength"->n, the output of the encoder will be the first n audio samples from the input audio, with zero padding applied if n is larger than the number of audio samples.
- With the parameter "WindowSize"->Automatic, a partition length is computed as Ceiling[0.025*sr]], where sr is the sample rate "SampleRate". Use "WindowSize"->n to select a partition length of n samples.
- With the parameter "Offset"->Automatic, an offset is computed as Ceiling[ws/3], where ws is the partition length "WindowSize". Use "Offset"->n to select a partition offset of n samples.
- With the parameter "MinimumFrequency"->Automatic, a frequency is computed as Ceiling[sr /ws], where sr is the sample rate "SampleRate" and ws is the partition length "WindowSize". Use "MinimumFrequency"f to set the minimum frequency for the filters to f.
- With the parameter "MaximumFrequency"->Automatic, a frequency is computed as Round[Min[8000,sr/2]]], where sr is the sample rate "SampleRate". Use "MaximumFrequency"f to set the maximum frequency for the filters to f.
- With the parameter "NumberOfFilters"->n, n filters will be used in the computation of the mel-spectrogram.
Parameters
Examples
open allclose allBasic Examples (1)
Scope (3)
Parameters (9)
Properties & Relations (1)
See Also
NetEncoder Audio SpectrogramArray AudioResample ConformAudio NetChain NetGraph NetTrain