Audio Basics

The Wolfram Language provides built-in support for both programmatic and interactive audio processing, fully integrated with the Wolfram Language's powerful mathematical and algorithmic capabilities. You can create and import sound files, manipulate them with built-in functions, apply linear and nonlinear filters, and visualize them in any number of ways.

Audio Creation and Representation

An audio object can be created from numerical arrays, files, and URLs.

Audio[data]in-core audio with samples given by data
Audio["file"]out-of-core audio from a file
Audio["url"]out-of-core audio from a url
Import["file"]import audio from a file
AudioGenerator[model]generate various oscillators and noises

Audio creation functions.

The simplest way to create an audio object is to wrap the Audio constructor around a vector of real values ranging from to 1.

Here is a one-channel in-core audio object created from a vector of numbers.
In[1]:=
Click for copyable input
Out[1]=

Another way is to obtain an out-of-core audio object from a file on the local file system or any accessible remote location. Out-of-core audio objects do not store samples in memory.

This creates an out-of-core audio object from the Wolfram Language documentation directory ExampleData and displays the number of bytes used internally by the object.
In[2]:=
Click for copyable input
Out[2]=
In[3]:=
Click for copyable input
Out[3]=

You can use Import to create an in-core audio object from a file or URL.

This creates an in-core audio object and displays the number of bytes used internally by the object.
In[4]:=
Click for copyable input
In[5]:=
Click for copyable input
Out[5]=

The "Audio" collection from ExampleData contains sample audio clips.

This lists available audio clips.
In[6]:=
Click for copyable input
Out[6]//Short=
This lists the properties of a given audio object.
In[7]:=
Click for copyable input
Out[7]=

All types of Sound objects can be converted to Audio.

Here is an example that converts MIDI notes to audio.
In[8]:=
Click for copyable input
Out[8]=
See the waveform of the generated audio.
In[9]:=
Click for copyable input
Out[9]=
This converts sound represented by SampledSoundFunction to audio.
In[10]:=
Click for copyable input
Out[10]=
In[11]:=
Click for copyable input
Out[11]=

Various oscillators and noises can be created using AudioGenerator.

This creates sinusoidal audio.
In[12]:=
Click for copyable input
Out[12]=
This creates one second of the pink noise.
In[13]:=
Click for copyable input
Out[13]=

Audio Properties

Useful properties of an audio object can be obtained by calling the following functions.

AudioLength[audio]give the number of samples of an audio object
Duration[audio]give the duration of audio in seconds
AudioChannels[audio]give the number of channels present in the data for audio
AudioSampleRate[audio]give the sample rate associated with audio
AudioType[audio]give the type of values used for each sample element in audio

Audio properties.

This returns audio length, duration, number of channels, sample rate, and data type.
In[14]:=
Click for copyable input
In[15]:=
Click for copyable input
Out[15]=

The array of sample values can be extracted using the function AudioData. By default, the function returns real values, but you can ask for a specific type using the optional "type" argument.

This returns a fragment of the in-core audio as a vector of real values scaled to the range to 1.
In[16]:=
Click for copyable input
Out[16]//MatrixForm=
Here is the same fragment extracted from the out-of-core audio as a vector of integers in the range to 128.
In[17]:=
Click for copyable input
Out[17]//MatrixForm=

In the case of multichannel audio, the raw sample data is represented by a list of channel values (2D array).

This creates an out-of-core multichannel audio object and extracts a fragment of that audio object as a list of channel values.
In[18]:=
Click for copyable input
In[19]:=
Click for copyable input
Out[19]=

A multichannel audio object can be split into a list of single-channel audio objects and conversely, a multichannel audio object can be created from any number of single-channel audio objects.

This splits the example stereo audio object into two mono audio objects.
In[20]:=
Click for copyable input
Out[20]=
This creates the stereo audio object from two mono audio objects.
In[21]:=
Click for copyable input
Out[21]=

Audio Visualization

Audio can be visualized in a variety of ways.

AudioPlot[audio]plot the waveform of audio
Periodogram[audio]plot the squared magnitude of the discrete Fourier transform (power spectrum) of audio
Spectrogram[audio]plot the spectrogram of audio

Audio visualization functions.

This plots the waveform of an audio signal.
In[22]:=
Click for copyable input
Out[22]=
This highlights silent parts of an audio signal.
In[23]:=
Click for copyable input
Out[25]=
This plots the power spectrum of an audio signal.
In[26]:=
Click for copyable input
Out[26]=
This plots the spectrogram of an audio signal using partitions of length 512 and the offset 64.
In[27]:=
Click for copyable input
Out[27]=

Basic Operations

Many useful audio processing tasks require nothing more than simple arithmetic operations between two audio objects or an audio object and a constant. For example, you can change volume by multiplying an audio object by a constant factor or by adding (subtracting) a constant to (from) an audio object. For this purpose, all Wolfram Language operators and functions with attributes NumericFunction or Listable are overloaded to work with audio objects.

audio1+audio2add two audio objects
n*audiomultiply a scalar by an audio object
Mean[{a1,a2,}]compute the mean of a list of audio objects

Some arithmetical and statistical operations on audio objects.

This creates a linear combination of two audio objects.
In[28]:=
Click for copyable input
Out[28]=
This computes the mean of a list of audio objects.
In[64]:=
Click for copyable input
Out[64]=
This computes statistics of an audio object.
In[30]:=
Click for copyable input
Out[31]=

A system option called "IndeterminateValue" is used to replace values that can result from arithmetical operation but cannot be stored in an audio object. These include ComplexInfinity and Indeterminate.

This displays the current value of the "IndeterminateValue" option and changes it to 2.
In[32]:=
Click for copyable input
Out[32]=
In[33]:=
Click for copyable input
Out[33]=
This shows how "IndeterminateValue" is used to replace unacceptable values in the result of an arithmetical operation.
In[34]:=
Click for copyable input
In[36]:=
Click for copyable input
Out[36]=
In[37]:=
Click for copyable input
Out[37]=
Complex numbers in the result of an arithmetical operation are replaced by the real parts of these values.
In[38]:=
Click for copyable input
In[40]:=
Click for copyable input
Out[40]=
In[41]:=
Click for copyable input
Out[41]=
In[42]:=
Click for copyable input
Out[42]=

Consider the audio manipulation operations that change the audio duration by trimming, deleting, or padding. These operations serve a variety of useful purposes. Trimming and deleting allow you to create a new audio object from a selected portion of a larger one, while padding is typically used to extend an audio object at the ends to ensure uniform treatment of the end samples in many audio processing tasks.

AudioTrim[audio,{t1,t2}]give an audio object consisting of only samples between t1 and t2
AudioPad[audio,m]pad audio on both sides with m seconds of zeros
AudioDelete[audio,{t1,t2}]delete from time t1 to time t2
AudioSplit[audio,{t1,t2,}]split audio at times ti
AudioPartition[audio,dur,offset]partition audio into overlapping segments
AudioIntervals[audio,crit]return intervals of audio for which criteria crit is satisfied

Some basic audio operations.

This selects the first second of an audio object.
In[43]:=
Click for copyable input
Out[43]=
This shows three different padding methods applied to the end of an audio object.
In[44]:=
Click for copyable input
Out[44]=
This trims the silence from the beginning and the end of an audio object.
In[45]:=
Click for copyable input
Out[45]=
In[46]:=
Click for copyable input
Out[46]=
This deletes samples between the first and the third seconds of an audio object.
In[47]:=
Click for copyable input
Out[47]=
In[48]:=
Click for copyable input
Out[48]=
This splits an audio object in half.
In[49]:=
Click for copyable input
Out[49]=
This partitions an audio object into non-overlapping fragments.
In[50]:=
Click for copyable input
Out[50]=
This partitions an audio object into overlapping fragments.
In[51]:=
Click for copyable input
Out[51]=

Intervals that satisfy various criteria can be computed and used to extract corresponding blocks of an audio object.

This computes the intervals where the maximal value is greater than .1 and the value of the spectral centroid is smaller than 800.
In[52]:=
Click for copyable input
Out[52]=
This returns audio objects corresponding to the intervals computed above.
In[53]:=
Click for copyable input
Out[53]=

It is frequently necessary to change the sample rate of an audio object by resampling, or to normalize audio samples in some manner. Functions that perform these basic tasks are readily available.

AudioResample[audio,sr]give a resampled audio object that has sample rate sr
AudioNormalize[audio]normalize audio so that the maximum absolute value of its samples is 1
AudioAmplify[audio,s]multiply all samples of the audio by a factor s

Audio resampling, normalizing, and amplifying.

This changes the sample rate of the example audio to 8000.
In[54]:=
Click for copyable input
Out[54]=
This normalizes audio samples.
In[55]:=
Click for copyable input
Out[55]=
This increases the volume of an audio object.
In[56]:=
Click for copyable input
Out[56]=

The Wolfram Language also provides functions that perform operations on the list of audio objects.

AudioJoin[list]concatenate a list of audio objects
AudioOverlay[list]overlay a list of audio objects
ConformAudio[list]conform properties of each audio object from the list

Joining, overlaying, and conforming lists of audio objects.

This joins two audio signals.
In[57]:=
Click for copyable input
Out[57]=
This overlays two audio signals.
In[58]:=
Click for copyable input
Out[58]=
This conforms two audio signals.
In[59]:=
Click for copyable input
In[61]:=
Click for copyable input
Out[61]//TableForm=
In[62]:=
Click for copyable input
Out[62]//TableForm=

Related Tutorials