"AudioSpectrogram" (Net Encoder)


represents an encoder that converts an audio file or object into its spectrogram.


represents an encoder with specific parameters for preprocessing and feature computation.


  • NetEncoder[][input] applies the encoder to an input to produce an output.
  • NetEncoder[][{input1,input2,}] applies the encoder to a list of inputs to produce a list of outputs.
  • The input to the encoder can be an Audio object or a File[] expression.
  • The output of the encoder is a rank-2 tensor of dimensions {n,Floor[(ws/2.)+1]}, where n is the number of partitions after the preprocessing is applied and ws is the length of the partitions used for the computation.
  • An encoder can be attached to an input port of a net by specifying "port"->NetEncoder[] when constructing the net.
  • Parameters
  • The following parameters are supported:
  • "Normalization"Nonewhether to apply normalization
    "SampleRate"16000target sample rate
    "TargetLength"Alltarget output length
    "WindowSize"Automaticlength of the partitions
    "Offset"Automaticoffset of the partitions
  • With the parameter "Normalization"None, no normalization is applied.
  • With the parameter "Normalization"Automatic, the signal is normalized to the maximum absolute value. The normalization is applied to the sample values before the short-time Fourier transform is computed.
  • With the parameter "TargetLength"->All, the output of the encoder includes all available audio samples from the input audio.
  • With the parameter "TargetLength"->n, the output of the encoder will be the first n audio samples from the input audio, with zero padding applied if n is larger than the number of audio samples.
  • With the parameter "WindowSize"->Automatic, a partition length is computed as Ceiling[0.025*sr]], where sr is the sample rate "SampleRate". Use "WindowSize"->n to select a partition length of n samples.
  • With the parameter "Offset"->Automatic, an offset is computed as Ceiling[ws/3], where ws is the partition length "WindowSize". Use "Offset"->n to select a partition offset of n samples.


open allclose all

Basic Examples  (1)

Create a spectrogram NetEncoder:

Click for copyable input

Create an Audio object:

Click for copyable input

Apply the encoder to the Audio object:

Click for copyable input

Plot the result:

Click for copyable input

Scope  (3)

Parameters  (6)

Properties & Relations  (2)

See Also

NetEncoder  Audio  SpectrogramArray  AudioResample  ConformAudio  NetChain  NetGraph  NetTrain


Related NetEncoders