Audio Processing
The Wolfram Language provides built-in support for both programmatic and interactive audio processing, fully integrated with other powerful mathematical and algorithmic capabilities. You can process audio objects by applying linear and nonlinear filters, add effects, and analyze them using audio-specific functions or by exploiting the extensive integration with the rest of the Wolfram Language.
LowpassFilter[audio,ωc] | apply a lowpass filter with a cutoff frequency ωc to audio |
HighpassFilter[audio,ωc] | apply a highpass filter with a cutoff frequency ωc to audio |
WienerFilter[audio,r] | apply Wiener filter with a range of r samples to audio |
MeanFilter[audio,r] | apply mean filter with a range of r samples to audio |
TotalVariationFilter[audio] | apply total variation filter to audio |
GaussianFilter[audio, r] | apply Gaussian filter with a range of r samples to audio |
Many of the filtering functions present in the Wolfram Language can be immediately used on audio objects. In many cases, it is possible to specify the cutoff frequency as a frequency Quantity.
In[10]:=10

✖
https://wolfram.com/xid/06ui2xae9ne-zmwd33
Out[10]=10

In[11]:=11

✖
https://wolfram.com/xid/06ui2xae9ne-b1rklq
Out[11]=11

In[12]:=12

✖
https://wolfram.com/xid/06ui2xae9ne-swa2vz
Out[12]=12

In[13]:=13

✖
https://wolfram.com/xid/06ui2xae9ne-5b13kz
Out[13]=13

In[7]:=7

✖
https://wolfram.com/xid/06ui2xae9ne-j3hg3f
Out[7]=7

Use WienerFilter to denoise a recording:
In[14]:=14

✖
https://wolfram.com/xid/06ui2xae9ne-0uolje
Out[14]=14

In[15]:=15

✖
https://wolfram.com/xid/06ui2xae9ne-8vo4pd
Out[15]=15

In[41]:=41

✖
https://wolfram.com/xid/06ui2xae9ne-klw3ta
Out[41]=41

Discrete-time transfer function models can be used to filter an audio object using RecurrenceFilter.
RecurrenceFilter[tf,audio] | uses a discrete-time filter defined by the TransferFunctionModel tf |
BiquadraticFilterModel[{"type",spec}] | creates a biquadratic filter of a given {"type",spec} |
ButterworthFilterModel[{"type",spec}] | creates a Butterworth filter of a given {"type",spec} |
TransferFunctionModel[m,s] | represents the model of the transfer-function matrix m with complex variable s |
ToDiscreteTimeModel[lsys,τ] |
gives the discrete-time approximation, with sampling period
τ
, of the continuous-time systems models
lsys
|
The simplest way to use one of the analog (continuous-time) filter models is to discretize the transfer function using ToDiscreteTimeModel and apply the result to an audio object using RecurrenceFilter.
In[18]:=18

✖
https://wolfram.com/xid/06ui2xae9ne-cc6mkz
Out[18]=18

In[20]:=20

✖
https://wolfram.com/xid/06ui2xae9ne-gltqn
Out[20]=20

✖
https://wolfram.com/xid/06ui2xae9ne-xsvkbz
In[21]:=21

✖
https://wolfram.com/xid/06ui2xae9ne-nvxvs7
In[22]:=22

✖
https://wolfram.com/xid/06ui2xae9ne-2xt8ks
Out[22]=22

A discrete transfer function can be created using TransferFunctionModel and applied to an audio object with RecurrenceFilter.
Define a comb filter using TransferFunctionModel and apply it to an audio object:
In[23]:=23

✖
https://wolfram.com/xid/06ui2xae9ne-dap6dk
In[24]:=24

✖
https://wolfram.com/xid/06ui2xae9ne-o83zxe
Out[24]=24

In[25]:=25

✖
https://wolfram.com/xid/06ui2xae9ne-do1gnn
Out[25]=25

AudioTimeStretch[audio,r] | apply time stretching by the specified factor r to audio |
AudioPitchShift[audio,r] | apply pitch shifting by the specified factor r to audio |
AudioReverb[audio] | apply a reverberation effect to audio |
AudioDelay[audio,delay] | apply a delay effect with delay time delay to audio |
AudioChannelMix[audio,desttype] | mix the channel of audio to the specified desttype |
Use pitch shifting and time stretching to independently modify pitch and duration of an audio signal.
In[26]:=26

✖
https://wolfram.com/xid/06ui2xae9ne-8qtdlm
Out[26]=26

In[27]:=27

✖
https://wolfram.com/xid/06ui2xae9ne-f21iv3
Out[27]=27

In[28]:=28

✖
https://wolfram.com/xid/06ui2xae9ne-3nemox
Out[28]=28

Delay and reverberation effects can be used to immerse a recording in a virtual environment or to produce special effects.
In[29]:=29

✖
https://wolfram.com/xid/06ui2xae9ne-rqx2dx
Out[29]=29

In[30]:=30

✖
https://wolfram.com/xid/06ui2xae9ne-42pl65
Out[30]=30

Perform Karplus–Strong synthesis by adding a short delay with a high feedback value to a burst of noise. This will simulate the sound of a vibrating string:
In[31]:=31

✖
https://wolfram.com/xid/06ui2xae9ne-1jtfq6
Out[33]=33

Downmixing and upmixing to an arbitrary number of channels can be achieved using AudioChannelMix.
In[25]:=25

✖
https://wolfram.com/xid/06ui2xae9ne-rcvhqr
Out[25]=25

In[26]:=26

✖
https://wolfram.com/xid/06ui2xae9ne-191ox9
Out[26]=26

In[27]:=27

✖
https://wolfram.com/xid/06ui2xae9ne-5m6oth
Out[27]=27

It is possible to alter a recording by taking advantage of performing arithmetic operations on the Audio object. All Wolfram Language operators and functions with attributes NumericFunction or Listable are overloaded to work with audio objects.
Apply a smooth distortion to an audio object using the Tanh function:
In[34]:=34

✖
https://wolfram.com/xid/06ui2xae9ne-dqu8vz
Out[34]=34

In[35]:=35

✖
https://wolfram.com/xid/06ui2xae9ne-jwbmdo
Out[35]=35

Use the ChebyshevT function to obtain a "waveshaper" effect:
In[36]:=36

✖
https://wolfram.com/xid/06ui2xae9ne-l62kct
Out[36]=36

In[37]:=37

✖
https://wolfram.com/xid/06ui2xae9ne-ng7t5x
Out[37]=37

Analysis of the Whole Signal
AudioMeasurements[audio,"prop"] | compute the property "prop" for the entire audio |
Both time domain and frequency domain properties can be measured with AudioMeasurements. The properties are computed on the average sample values over the channels of the audio object.
In[32]:=32

✖
https://wolfram.com/xid/06ui2xae9ne-c0n5d5
Out[32]=32

In[33]:=33

✖
https://wolfram.com/xid/06ui2xae9ne-mlw8hp
Out[33]=33

In[34]:=34

✖
https://wolfram.com/xid/06ui2xae9ne-c76az1
In[35]:=35

✖
https://wolfram.com/xid/06ui2xae9ne-yd5p5
Out[35]=35

In[36]:=36

✖
https://wolfram.com/xid/06ui2xae9ne-kqz1ud
Out[36]=36

Unlike AudioMeasurements, overloaded functions are applied to the flattened version of the data. If the input is a multichannel audio object, the sample values from all channels will be flattened in a single array.
In[37]:=37

✖
https://wolfram.com/xid/06ui2xae9ne-wf9hxw
Out[37]=37

Analysis of the Partitioned Signal
In addition to global properties of audio objects, it is also possible to compute measurements locally.
AudioLocalMeasurements[audio,"prop"] | compute the property "prop" locally for partitions of audio |
AudioIntervals[audio,crit] | find the intervals of audio for which the criterion crit is satisfied |
In AudioLocalMeasurements, properties are computed locally. The signal is partitioned according to the PartitionGranularity specification, and the requested property is computed on each partition. The result is returned as a TimeSeries whose timestamps correspond to the center of each partition.
In[44]:=44

✖
https://wolfram.com/xid/06ui2xae9ne-q01pll
Out[45]=45

The result of AudioLocalMeasurements or AudioMeasurements can be used as an input for other functionality in the Wolfram Language.
Use the "SpectralCentroid" and "SpectralSpread" measurements to find clusters of similar audio objects in a list:
In[1]:=1

✖
https://wolfram.com/xid/06ui2xae9ne-v16hya
Out[3]=3

In[4]:=4

✖
https://wolfram.com/xid/06ui2xae9ne-x66san
Out[4]=4

Use the MFCC measurement as a feature to compute the distance between various elements of the ExampleData["Audio"] collection:
In[47]:=47

✖
https://wolfram.com/xid/06ui2xae9ne-jcmj4y
In[49]:=49

✖
https://wolfram.com/xid/06ui2xae9ne-g0luk
In[50]:=50

✖
https://wolfram.com/xid/06ui2xae9ne-1mkr9j
Out[50]=50

Using AudioIntervals allows you to extract intervals on which a user-defined criterion is satisfied.
In[51]:=51

✖
https://wolfram.com/xid/06ui2xae9ne-8t4yro
Out[52]=52

In[53]:=53

✖
https://wolfram.com/xid/06ui2xae9ne-3s3nij
Out[53]=53

High-Level Analysis
It is possible to use neural networks–based functions to gain a deeper insight into the contents of a signal.
SpeechRecognize[audio] | recognize the speech in audio and return it as a string |
PitchRecognize[audio] | recognize the main pitch in audio |
AudioIdentify[audio] | identify what audio is the recording of |
In[53]:=53

✖
https://wolfram.com/xid/06ui2xae9ne-68ivou
Out[53]=53

In[2]:=2

✖
https://wolfram.com/xid/06ui2xae9ne-l0e80j
Out[2]=2

In[54]:=54

✖
https://wolfram.com/xid/06ui2xae9ne-yb86jr
Out[54]=54

In[4]:=4

✖
https://wolfram.com/xid/06ui2xae9ne-zeta96
Out[4]=4

All the machine learning functions are aware of Audio objects and perform their computations starting with a semantically significant feature extraction.
Classify[{audio1class1,audio2class2,…}] | generate a ClassifierFunction[…] trained on the examples and classes given |
FeatureExtraction[{audio1,audio2,…}] | generate a FeatureExtractorFunction[…] trained on the examples given |
FeatureSpacePlot[{audio1,audio2,…}] | plot the features extracted from audioi as a scatter plot |
This preprocessing transforms each audio object in a fixed-size vector, so that they can be easily compared.
In[9]:=9

✖
https://wolfram.com/xid/06ui2xae9ne-vu8mt
In[16]:=16

✖
https://wolfram.com/xid/06ui2xae9ne-01oz42
Out[16]=16

In[17]:=17

✖
https://wolfram.com/xid/06ui2xae9ne-fs9uhp
Out[17]=17

Plot the features of a audio collection using FeatureSpacePlot.
Plot a list of signals in a semantically meaningful space with FeatureSpacePlot:
In[30]:=30

✖
https://wolfram.com/xid/06ui2xae9ne-reuy6s
Out[30]=30

Neural Networks
The Audio object is tightly integrated in the powerful neural networks framework. NetEncoder provides an easy entry point into neural nets for various high-level constructs such as the Audio object.
"Audio" | encode a signal as a waveform |
"AudioSpectrogram" | encode a signal as a spectrogram |
"AudioMFCC" | encode a signal as a mel spectrogram |
Some examples of audio NetEncoder.
Different encoders can be used to compute various kinds of features. Some maintain all of the information of the original signal (like "Audio" and "AudioSTFT"), while others provide a compromise between discarding some information but dramatically reducing the dimensionality (like "AudioMFCC").
In[55]:=55

✖
https://wolfram.com/xid/06ui2xae9ne-7lkafr
Out[55]=55

In[6]:=6

✖
https://wolfram.com/xid/06ui2xae9ne-jjweg2
Out[6]=6

In[7]:=7

✖
https://wolfram.com/xid/06ui2xae9ne-qbojhp
Out[7]=7

In[8]:=8

✖
https://wolfram.com/xid/06ui2xae9ne-nwxj0t
Out[8]=8

Using the encoders, it is easy to train networks from scratch to solve audio-related tasks and produce measurements of the resulting performance.
NetTrain[net,data] | train the network net on the dataset data |
NetChain[{layer1,layer2,…}] | specify a net in which the output of layeri is connected to the input of layeri+1 |
NetMeasurements[net,data,measurement] | compute the requested measurement for the net evaluated on data |
Use NetChain and NetGraph to create networks of arbitrary topology, and leverage sequence-focused layers such as GatedRecurrentLayer and LongShortTermMemoryLayer to analyze variable length signals.
In[58]:=58

✖
https://wolfram.com/xid/06ui2xae9ne-2slyjm
Out[58]=58

In[64]:=64

✖
https://wolfram.com/xid/06ui2xae9ne-ine3da
Out[64]=64

In[70]:=70

✖
https://wolfram.com/xid/06ui2xae9ne-jpzbp2
Out[70]=70
