"AudioSTFT" (Net Encoder)


represents an encoder that converts an audio file or object into its short-time Fourier transform.


represents an encoder with specific parameters for preprocessing.


  • NetEncoder[][input] applies the encoder to an input to produce an output.
  • NetEncoder[][{input1,input2,}] applies the encoder to a list of inputs to produce a list of outputs.
  • The input to the encoder can be an Audio object or a File[] expression.
  • The output of the encoder is a rank-3 tensor of dimensions {n,ws,2}, where n is the number of partitions after the preprocessing is applied and ws is the length of the partitions used for the computation. The last dimension represents the real and imaginary parts of the result.
  • An encoder can be attached to an input port of a net by specifying "port"->NetEncoder[] when constructing the net.
  • Parameters
  • The following parameters are supported:
  • "Normalization"Nonewhether to apply normalization
    "SampleRate"16000target sample rate
    "TargetLength"Alltarget output length
    "WindowSize"Automaticlength of the partitions
    "Offset"Automaticoffset of the partitions
  • With the parameter "Normalization"None, no normalization is applied.
  • With the parameter "Normalization"Automatic, the signal is normalized to the maximum absolute value. The normalization is applied to the sample values before the short-time Fourier transform is computed.
  • With the parameter "TargetLength"->All, the output of the encoder includes all available audio samples from the input audio.
  • With the parameter "TargetLength"->n, the output of the encoder will be the first n audio samples from the input audio, with zero padding applied if n is larger than the number of audio samples.
  • With the parameter "WindowSize"->Automatic, a partition length is computed as Ceiling[0.025*sr]], where sr is the sample rate "SampleRate". Use "WindowSize"->n to select a partition length of n samples.
  • With the parameter "Offset"->Automatic, an offset is computed as Ceiling[ws/3], where ws is the partition length "WindowSize". Use "Offset"->n to select a partition offset of n samples.


open allclose all

Basic Examples  (1)

Create an audio STFT NetEncoder:

Click for copyable input

Create an Audio object:

Click for copyable input

Apply the encoder to the Audio object:

Click for copyable input

Scope  (3)

Parameters  (6)

Properties & Relations  (1)

Possible Issues  (1)

See Also

NetEncoder  Audio  SpectrogramArray  AudioResample  ConformAudio  NetChain  NetGraph  NetTrain


Related NetEncoders