Wolfram Language & System Documentation Center

AudioDistance

returns a distance measure between audio₁ and audio₂.

returns a distance measure between the audio tracks of video₁ and video₂.

Details and Options

AudioDistance computes a dissimilarity measure between audio objects that may compare waveforms or other features of the signals, using different distance functions.
If audio₁ and audio₂ are of different durations, the distance is computed on the trimmed signals to the shorter duration by default.
The following options can be specified:

DistanceFunction	Automatic	the distance function to use
Masking	Automatic	the audio intervals to use for comparison
PartitionGranularity	Automatic	audio partitioning specification
SampleRate	Automatic	sample rate for conforming audio_i

By default, using DistanceFunction->Automatic, the EuclideanDistance of audio waveforms is computed. Compute other measures using different distance functions or different features.
The following distance functions are computed from the Fourier transform of audio_i:

	"SpectralEuclidean"	Euclidean applied to the power spectra (default)
	"SpectralItakuraSaito"	maximum likelihood of LPC-derived spectral envelopes
	"SpectralMagnitudePhaseDistortion"	the average of magnitude and phase spectral distances
	"SpectralRMSLog"	Euclidean applied to the log of power spectra
	"SpectralFirstOrderDifferential"	distance between first-order derivatives of power spectra
	"SpectralSecondOrderDifferential"	distance between second-order derivatives of power spectra
	"Cepstral"	Euclidean applied to the power cepstra

Additional DistanceFunction settings are also available and can work on different audio features:

	EuclideanDistance	Euclidean distance
	SquaredEuclideanDistance	squared Euclidean distance
	NormalizedSquaredEuclideanDistance	normalized squared Euclidean distance
	RootMeanSquare	root mean square distance
	ManhattanDistance	Manhattan or "city block" distance
	CosineDistance	angular cosine distance
	CorrelationDistance	correlation coefficient distance
	WarpingDistance	dynamic time warping (DTW) distance
	f	an arbitrary function f

By default, WarpingDistance is computed from the "MFCC" features and all other distances are computed from "AudioData".
Using DistanceFunction->{method,FeatureExtractor->f}, a different feature extractor can be specified.
Possible settings for FeatureExtractor include:

	"AudioData"	audio data
	"Formants"	frequencies of the formants of the signal
	"LPC"	linear prediction coefficients
	"MelSpectrogram"	mel-scale audio spectrogram
	"MFCC"	mel-frequency cepstral coefficients vectors sequence
	"Novelty"	estimated measure for significant changes
	"Spectrogram"	spectrogram

By default, AudioDistance is computed on the trimmed signals to the shorter duration.
Use the Masking option to compute the distance measure on different intervals. Possible settings include:

	Automatic	trim to the shorter duration (default)
	All	pad to the longer duration
	{t₁,t₂}	compare the signals between times t₁ and t₂
	{{t₁₁,t₁₂},{t₂₁,t₂₂}}	t₁₁ to t₁₂ from audio₁ compared to t₂₁ to t₂₂ from audio₂

Using Masking->{{t₂₂,t₁₂},{t₂₁,t₂₂}}, the duration of the two intervals should be the same.
PartitionGranularity is only used with features that work on partitioned audio, like "MFCC", and ignored otherwise.
By default, SampleRate->Automatic takes the highest sample rate in all audio_i.

Examples

open all close all

Basic Examples (1)

Distance between two audio objects:

Wolfram Language code: AudioDistance[AudioGenerator["Sin"], AudioGenerator["Triangle"]]

Scope (2)

Distance of two audio signals with different lengths:

Wolfram Language code:

a = AudioGenerator["Sin", 2];
b = AudioGenerator["Triangle", 1];
AudioDistance[a, b]

The longer signal is trimmed to the shorter duration:

Wolfram Language code: AudioDistance[AudioTrim[a, 1], b]

Distance of the audio tracks of two videos:

Wolfram Language code: AudioDistance[Video["ExampleData/fish.mp4"], Video["ExampleData/rule30.mp4"]]

Options (13)

DistanceFunction (6)

By default, the "SpectralEuclidean" distance is used:

Wolfram Language code:

{a, b} = {AudioGenerator["Sin"], AudioGenerator["Triangle"]};
AudioDistance[a, b] === AudioDistance[a, b, DistanceFunction -> "SpectralEuclidean"]

Various distances are computed on the sample values of the audio signals:

Wolfram Language code:

{a, b} = {AudioGenerator["Sin"], AudioGenerator["Triangle"]};
distances = {EuclideanDistance, SquaredEuclideanDistance, NormalizedSquaredEuclideanDistance, ManhattanDistance, CosineDistance, CorrelationDistance, RootMeanSquare};

Wolfram Language code:

res = Table[{d, AudioDistance[a, b, DistanceFunction -> d]}, {d, distances}];
TextGrid[res, Frame -> All]

Distances computed on the spectrum compare the frequency content rather than the sample values:

Wolfram Language code:

{a, b} = {AudioGenerator["Sin"], AudioGenerator["Triangle"]};
distances = {"SpectralEuclidean", "SpectralItakuraSaito", "SpectralMagnitudePhaseDistortion", "SpectralRMSLog", "SpectralFirstOrderDifferential", "SpectralSecondOrderDifferential", "Cepstral"};

Wolfram Language code:

res = Table[{d, AudioDistance[a, b, DistanceFunction -> d]}, {d, distances}];
TextGrid[res, Frame -> All]

Phase differences of the signals do not affect the computed spectral distance:

Wolfram Language code:

spectralDistances = Table[AudioDistance[AudioGenerator[{"Sin", 440, phase}], b], {phase, 0., 2Pi, .2}];
sampleDistances = Table[AudioDistance[AudioGenerator[{"Sin", 440, phase}], b, DistanceFunction -> EuclideanDistance], {phase, 0., 2Pi, .2}];

Wolfram Language code:

ListLinePlot[Normalize /@ {spectralDistances, sampleDistances}, DataRange -> {0, 2Pi}, PlotLegends -> {"Spectral Euclidean", "Euclidean"}, AxesLabel -> {"phase", "normalized distance"}]

By default, any distance measure uses the most suitable audio feature:

Wolfram Language code:

{a, b} = {AudioGenerator["Sin"], AudioGenerator["Triangle"]};
AudioDistance[a, b, DistanceFunction -> EuclideanDistance]

Most distances use "AudioData" as the default feature:

Wolfram Language code:

distances = {EuclideanDistance, SquaredEuclideanDistance, NormalizedSquaredEuclideanDistance, ManhattanDistance, CosineDistance, CorrelationDistance, RootMeanSquare};
Table[{d, AudioDistance[a, b, DistanceFunction -> d] == AudioDistance[a, b, DistanceFunction -> {d, FeatureExtractor -> "AudioData"}]}, {d, distances}]//TextGrid

With WarpingDistance, the "MFCC" feature is used by default:

Wolfram Language code: {a, b} = {AudioGenerator["Sin"], AudioGenerator["Triangle"]};AudioDistance[a, b, DistanceFunction -> WarpingDistance]

Wolfram Language code: % == AudioDistance[a, b, DistanceFunction -> {WarpingDistance, FeatureExtractor -> "MFCC"}]

Specify a different feature:

Wolfram Language code:

{a, b} = {AudioGenerator["Sin"], AudioGenerator["Triangle"]};
AudioDistance[a, b, DistanceFunction -> {EuclideanDistance, FeatureExtractor -> "MFCC"}]

All features other than "AudioData" are computed from the signal's short-time Fourier transform:

Wolfram Language code:

features = {"AudioData", "Spectrogram", "MelSpectrogram", "MFCC", "Novelty", "Formants", "LPC"};Table[{f, AudioDistance[a, b, DistanceFunction -> {EuclideanDistance, FeatureExtractor -> f}]}, {f, features}]//TextGrid[#, Frame -> All]&

Masking (4)

If two signals have different lengths, the longer is trimmed to the shorter duration:

Wolfram Language code: {a, b, c} = {AudioGenerator["Sin"], AudioGenerator["Triangle"], AudioGenerator["Sin", 2]};

Wolfram Language code:

AudioDistance[a, b]
AudioDistance[c, b]

To compare the full length of the signals, use Masking->All:

Wolfram Language code: AudioDistance[c, b, Masking -> All]

Use the Masking option to compare a specific interval of two audio objects:

Wolfram Language code:

a = ExampleData[{"Audio", "FemaleVoice"}];
b = ExampleData[{"Audio", "MaleVoice"}];

Wolfram Language code: AudioDistance[a, b, Masking -> {1, 1.5}]

As long as the duration of the intervals is the same, they can be chosen from different times:

Wolfram Language code: a = ExampleData[{"Audio", "FemaleVoice"}]

Wolfram Language code:

times = Table[t, {t, 0, 3.9, .1}];
duration = .35;

Wolfram Language code:

dist = Table[
	AudioDistance[a, a, Masking -> {{0, duration}, {t, t + duration}}, DistanceFunction -> "SpectralMagnitudePhaseDistortion"], {t, times}];
ListLinePlot[dist, PlotRange -> All, DataRange -> MinMax[times]]

Use MaskingAll to compare the full length of the signals:

Wolfram Language code:

a = ExampleData[{"Audio", "FemaleVoice"}];
b = ExampleData[{"Audio", "MaleVoice"}];

Wolfram Language code: AudioDistance[a, b, Masking -> All]

PartitionGranularity (2)

Use the PartitionGranularity option to control the computation of the features:

Wolfram Language code:

a = ExampleData[{"Audio", "FemaleVoice"}];
b = ExampleData[{"Audio", "MaleVoice"}];

Wolfram Language code:

AudioDistance[a, b, PartitionGranularity -> Quantity[46, "Milliseconds"], DistanceFunction -> {EuclideanDistance, FeatureExtractor -> "MFCC"}]

Wolfram Language code:

AudioDistance[a, b, PartitionGranularity -> Quantity[92, "Milliseconds"], DistanceFunction -> {EuclideanDistance, FeatureExtractor -> "MFCC"}]

If the selected feature is "AudioData", the PartitionGranularity option is ignored:

Wolfram Language code:

a = ExampleData[{"Audio", "FemaleVoice"}];
b = ExampleData[{"Audio", "MaleVoice"}];

Wolfram Language code:

AudioDistance[a, b, PartitionGranularity -> Quantity[46, "Milliseconds"], DistanceFunction -> {EuclideanDistance, FeatureExtractor -> AudioData}]

Wolfram Language code:

AudioDistance[a, b, PartitionGranularity -> Quantity[92, "Milliseconds"], DistanceFunction -> {EuclideanDistance, FeatureExtractor -> AudioData}]

SampleRate (1)

By default, all audio signals are converted to the higher sample rate:

Wolfram Language code: AudioDistance[AudioGenerator["Sin"], AudioGenerator["Triangle", SampleRate -> 11025]]

Wolfram Language code: AudioDistance[AudioGenerator["Sin"], AudioResample[AudioGenerator["Triangle", SampleRate -> 11025], 44100]]

Use a specific sample rate:

Wolfram Language code: AudioDistance[AudioGenerator["Sin"], AudioGenerator["Triangle", SampleRate -> 11025], SampleRate -> 11025]

Applications (1)

Distance between different oscillators:

Wolfram Language code:

waveforms = {"Sin", "Triangle", "Sawtooth", "Square"};m = DistanceMatrix[AudioGenerator /@ waveforms, DistanceFunction -> AudioDistance];
MatrixPlot[m, FrameTicks -> {Thread[{Range[4], waveforms}]}]

Visually compare the waveforms:

Wolfram Language code: AudioPlot[AudioGenerator /@ waveforms, PlotRange -> .01]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

AudioDistance

Details and Options

Examples

Basic Examples (1)

Scope (2)

Options (13)

DistanceFunction (6)

Masking (4)

PartitionGranularity (2)

SampleRate (1)

Applications (1)

Text

CMS

APA

BibTeX

BibLaTeX

AudioDistance

Details and Options

Examples

Basic Examples (1)

Scope (2)

Options (13)

DistanceFunction (6)

Masking (4)

PartitionGranularity (2)

SampleRate (1)

Applications (1)

See Also

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX