---
title: "AudioDistance"
language: "en"
type: "Symbol"
summary: "AudioDistance[audio1, audio2] returns a distance measure between audio1 and audio2. AudioDistance[video1, video2] returns a distance measure between the audio tracks of video1 and video2."
keywords: 
- audio distance
- audio similarity
- spectral audio distance
- cepstral audio distance
- Itakura-Saito quasi distance
- pitch distance
- rhythm distance
- acoustics distance
- LPC
- linear predictive coefficients
- MFCC
- mel-frequency cepstral coefficients
canonical_url: "https://reference.wolfram.com/language/ref/AudioDistance.html"
source: "Wolfram Language Documentation"
related_guides: 
  - 
    title: "Audio Representation"
    link: "https://reference.wolfram.com/language/guide/AudioRepresentation.en.md"
  - 
    title: "Audio Analysis"
    link: "https://reference.wolfram.com/language/guide/AudioAnalysis.en.md"
related_functions: 
  - 
    title: "AudioLocalMeasurements"
    link: "https://reference.wolfram.com/language/ref/AudioLocalMeasurements.en.md"
  - 
    title: "SpectrogramArray"
    link: "https://reference.wolfram.com/language/ref/SpectrogramArray.en.md"
  - 
    title: "CepstrogramArray"
    link: "https://reference.wolfram.com/language/ref/CepstrogramArray.en.md"
  - 
    title: "ConformAudio"
    link: "https://reference.wolfram.com/language/ref/ConformAudio.en.md"
  - 
    title: "WarpingDistance"
    link: "https://reference.wolfram.com/language/ref/WarpingDistance.en.md"
  - 
    title: "DistanceMatrix"
    link: "https://reference.wolfram.com/language/ref/DistanceMatrix.en.md"
  - 
    title: "Nearest"
    link: "https://reference.wolfram.com/language/ref/Nearest.en.md"
  - 
    title: "FindClusters"
    link: "https://reference.wolfram.com/language/ref/FindClusters.en.md"
  - 
    title: "Classify"
    link: "https://reference.wolfram.com/language/ref/Classify.en.md"
---
# AudioDistance

AudioDistance[audio1, audio2] returns a distance measure between audio1 and audio2.

AudioDistance[video1, video2] returns a distance measure between the audio tracks of video1 and video2.

## Details and Options

* ``AudioDistance`` computes a dissimilarity measure between audio objects that may compare waveforms or other features of the signals, using different distance functions.

* If ``audio1`` and ``audio2`` are of different durations, the distance is computed on the trimmed signals to the shorter duration by default.

* The following options can be specified:

|                       |           |                                           |
| --------------------- | --------- | ----------------------------------------- |
| DistanceFunction      | Automatic | the distance function to use              |
| Masking               | Automatic | the audio intervals to use for comparison |
| PartitionGranularity  | Automatic | audio partitioning specification          |
| SampleRate            | Automatic | sample rate for conforming audioi         |

* By default, using ``DistanceFunction -> Automatic``, the ``EuclideanDistance`` of audio waveforms is computed. Compute other measures using different distance functions or different features.

* The following distance functions are computed from the Fourier transform of ``audioi`` :

|                                    |                                                            |
| ---------------------------------- | ---------------------------------------------------------- |
| "SpectralEuclidean"                | Euclidean applied to the power spectra (default)           |
| "SpectralItakuraSaito"             | maximum likelihood of LPC-derived spectral envelopes       |
| "SpectralMagnitudePhaseDistortion" | the average of magnitude and phase spectral distances      |
| "SpectralRMSLog"                   | Euclidean applied to the log of power spectra              |
| "SpectralFirstOrderDifferential"   | distance between first-order derivatives of power spectra  |
| "SpectralSecondOrderDifferential"  | distance between second-order derivatives of power spectra |
| "Cepstral"                         | Euclidean applied to the power cepstra                     |

* Additional ``DistanceFunction`` settings are also available and can work on different audio features:

|                                    |                                       |
| ---------------------------------- | ------------------------------------- |
| EuclideanDistance                  | Euclidean distance                    |
| SquaredEuclideanDistance           | squared Euclidean distance            |
| NormalizedSquaredEuclideanDistance | normalized squared Euclidean distance |
| RootMeanSquare                     | root mean square distance             |
| ManhattanDistance                  | Manhattan or "city block" distance    |
| CosineDistance                     | angular cosine distance               |
| CorrelationDistance                | correlation coefficient distance      |
| WarpingDistance                    | dynamic time warping (DTW) distance   |
| f                                  | an arbitrary function f               |

* By default, ``WarpingDistance`` is computed from the ``"MFCC"`` features and all other distances are computed from ``"AudioData"``.

* Using ``DistanceFunction -> {method, FeatureExtractor -> f}``, a different feature extractor can be specified.

* Possible settings for ``FeatureExtractor`` include:

|                  |                                                      |
| ---------------- | ---------------------------------------------------- |
| "AudioData"      | audio data                                           |
| "Formants"       | frequencies of the formants of the signal            |
| "LPC"            | linear prediction coefficients                       |
| "MelSpectrogram" | mel-scale audio spectrogram                          |
| "MFCC"           | mel-frequency cepstral coefficients vectors sequence |
| "Novelty"        | estimated measure for significant changes            |
| "Spectrogram"    | spectrogram                                          |

* By default, ``AudioDistance`` is computed on the trimmed signals to the shorter duration.

* Use the ``Masking`` option to compute the distance measure on different intervals. Possible settings include:

|                          |                                                           |
| ------------------------ | --------------------------------------------------------- |
| Automatic                | trim to the shorter duration (default)                    |
| All                      | pad to the longer duration                                |
| {t1, t2}                 | compare the signals between times t1 and t2               |
| {{t11, t12}, {t21, t22}} | t11 to t12 from audio1 compared to t21 to t22 from audio2 |

* Using ``Masking -> {{t22, t12}, {t21, t22}}``, the duration of the two intervals should be the same.

* ``PartitionGranularity`` is only used with features that work on partitioned audio, like ``"MFCC"``, and ignored otherwise.

* By default, ``SampleRate -> Automatic`` takes the highest sample rate in all ``audioi``.

## Examples (17)

### Basic Examples (1)

Distance between two audio objects:

```wl
In[1]:= AudioDistance[AudioGenerator["Sin"], AudioGenerator["Triangle"]]

Out[1]= 1.123315681340161*^6
```

### Scope (2)

Distance of two audio signals with different lengths:

```wl
In[1]:=
a = AudioGenerator["Sin", 2];
b = AudioGenerator["Triangle", 1];
AudioDistance[a, b]

Out[1]= 1.123315681340161*^6
```

The longer signal is trimmed to the shorter duration:

```wl
In[2]:= AudioDistance[AudioTrim[a, 1], b]

Out[2]= 1.123315681340161*^6
```

---

Distance of the audio tracks of two videos:

```wl
In[1]:= AudioDistance[Video["ExampleData/fish.mp4"], Video["ExampleData/rule30.mp4"]]

Out[1]= 13351.7
```

### Options (13)

#### DistanceFunction (6)

By default, the ``"SpectralEuclidean"`` distance is used:

```wl
In[1]:=
{a, b} = {AudioGenerator["Sin"], AudioGenerator["Triangle"]};
AudioDistance[a, b] === AudioDistance[a, b, DistanceFunction -> "SpectralEuclidean"]

Out[1]= True
```

---

Various distances are computed on the sample values of the audio signals:

```wl
In[1]:=
{a, b} = {AudioGenerator["Sin"], AudioGenerator["Triangle"]};
distances = {EuclideanDistance, SquaredEuclideanDistance, NormalizedSquaredEuclideanDistance, ManhattanDistance, CosineDistance, CorrelationDistance, RootMeanSquare};

In[2]:=
res = Table[{d, AudioDistance[a, b, DistanceFunction -> d]}, {d, distances}];
TextGrid[res, Frame -> All]

Out[2]=
|                                    |            |
| ---------------------------------- | ---------- |
| EuclideanDistance                  | 31.6842    |
| SquaredEuclideanDistance           | 1003.89    |
| NormalizedSquaredEuclideanDistance | 0.0136583  |
| ManhattanDistance                  | 6024.93    |
| CosineDistance                     | 0.00725918 |
| CorrelationDistance                | 0.00725918 |
| RootMeanSquare                     | 0.150877   |
```

---

Distances computed on the spectrum compare the frequency content rather than the sample values:

```wl
In[1]:=
{a, b} = {AudioGenerator["Sin"], AudioGenerator["Triangle"]};
distances = {"SpectralEuclidean", "SpectralItakuraSaito", "SpectralMagnitudePhaseDistortion", "SpectralRMSLog", "SpectralFirstOrderDifferential", "SpectralSecondOrderDifferential", "Cepstral"};

In[2]:=
res = Table[{d, AudioDistance[a, b, DistanceFunction -> d]}, {d, distances}];
TextGrid[res, Frame -> All]

Out[2]=
|                                  |                        |
| -------------------------------- | ---------------------- |
| SpectralEuclidean                | 1.123315681340161*^6   |
| SpectralItakuraSaito             | 7.317652231849968`*^-7 |
 ... | SpectralRMSLog                   | 3.70166                |
| SpectralFirstOrderDifferential   | 1.5886442954662396*^6  |
| SpectralSecondOrderDifferential  | 2.246733257655664*^6   |
| Cepstral                         | 0.069542               |
```

Phase differences of the signals do not affect the computed spectral distance:

```wl
In[3]:=
spectralDistances = Table[AudioDistance[AudioGenerator[{"Sin", 440, phase}], b], {phase, 0., 2Pi, .2}];
sampleDistances = Table[AudioDistance[AudioGenerator[{"Sin", 440, phase}], b, DistanceFunction -> EuclideanDistance], {phase, 0., 2Pi, .2}];

In[4]:= ListLinePlot[Normalize /@ {spectralDistances, sampleDistances}, DataRange -> {0, 2Pi}, PlotLegends -> {"Spectral Euclidean", "Euclidean"}, AxesLabel -> {"phase", "normalized distance"}]

Out[4]= [image]
```

---

By default, any distance measure uses the most suitable audio feature:

```wl
In[1]:=
{a, b} = {AudioGenerator["Sin"], AudioGenerator["Triangle"]};
AudioDistance[a, b, DistanceFunction -> EuclideanDistance]

Out[1]= 31.6842
```

Most distances use ``"AudioData"`` as the default feature:

```wl
In[2]:=
distances = {EuclideanDistance, SquaredEuclideanDistance, NormalizedSquaredEuclideanDistance, ManhattanDistance, CosineDistance, CorrelationDistance, RootMeanSquare};
Table[{d, AudioDistance[a, b, DistanceFunction -> d] == AudioDistance[a, b, DistanceFunction -> {d, FeatureExtractor -> "AudioData"}]}, {d, distances}]//TextGrid

Out[2]=
|                                    |      |
| ---------------------------------- | ---- |
| EuclideanDistance                  | True |
| SquaredEuclideanDistance           | True |
| NormalizedSquaredEuclideanDistance | True |
| ManhattanDistance                  | True |
| CosineDistance                     | True |
| CorrelationDistance                | True |
| RootMeanSquare                     | True |
```

---

With ``WarpingDistance``, the ``"MFCC"`` feature is used by default:

```wl
In[1]:= {a, b} = {AudioGenerator["Sin"], AudioGenerator["Triangle"]};AudioDistance[a, b, DistanceFunction -> WarpingDistance]

Out[1]= 8048.11

In[2]:= % == AudioDistance[a, b, DistanceFunction -> {WarpingDistance, FeatureExtractor -> "MFCC"}]

Out[2]= True
```

---

Specify a different feature:

```wl
In[1]:=
{a, b} = {AudioGenerator["Sin"], AudioGenerator["Triangle"]};
AudioDistance[a, b, DistanceFunction -> {EuclideanDistance, FeatureExtractor -> "MFCC"}]

Out[1]= 1210.19
```

All features other than ``"AudioData"`` are computed from the signal's short-time Fourier transform:

```wl
In[2]:= features = {"AudioData", "Spectrogram", "MelSpectrogram", "MFCC", "Novelty", "Formants", "LPC"};Table[{f, AudioDistance[a, b, DistanceFunction -> {EuclideanDistance, FeatureExtractor -> f}]}, {f, features}]//TextGrid[#, Frame -> All]&

Out[2]=
|                |                         |
| -------------- | ----------------------- |
| AudioData      | 31.6842                 |
| Spectrogram    | 1.4077486664713677*^6   |
| MelSpectrogram | 1940.88                 |
| MFCC           | 1210.19                 |
| Novelty        | 3.59511                 |
| Formants       | 132050.                 |
| LPC            | 3.5187349128602424*^111 |
```

#### Masking (4)

If two signals have different lengths, the longer is trimmed to the shorter duration:

```wl
In[1]:= {a, b, c} = {AudioGenerator["Sin"], AudioGenerator["Triangle"], AudioGenerator["Sin", 2]};

In[2]:=
AudioDistance[a, b]
AudioDistance[c, b]

Out[2]= 1.123315681340161*^6

Out[2]= 1.123315681340161*^6
```

To compare the full length of the signals, use ``Masking -> All`` :

```wl
In[3]:= AudioDistance[c, b, Masking -> All]

Out[3]= 7.789527672315825*^6
```

---

Use the ``Masking`` option to compare a specific interval of two audio objects:

```wl
In[1]:=
a = ExampleData[{"Audio", "FemaleVoice"}];
b = ExampleData[{"Audio", "MaleVoice"}];

In[2]:= AudioDistance[a, b, Masking -> {1, 1.5}]

Out[2]= 6689.23
```

---

As long as the duration of the intervals is the same, they can be chosen from different times:

```wl
In[1]:= a = ExampleData[{"Audio", "FemaleVoice"}]

Out[1]= \!\(\*AudioBox[...]\)

In[2]:=
times = Table[t, {t, 0, 3.9, .1}];
duration = .35;

In[3]:=
dist = Table[
	AudioDistance[a, a, Masking -> {{0, duration}, {t, t + duration}}, DistanceFunction -> "SpectralMagnitudePhaseDistortion"], {t, times}];
ListLinePlot[dist, PlotRange -> All, DataRange -> MinMax[times]]

Out[3]= [image]
```

---

Use ``Masking -> All`` to compare the full length of the signals:

```wl
In[1]:=
a = ExampleData[{"Audio", "FemaleVoice"}];
b = ExampleData[{"Audio", "MaleVoice"}];

In[2]:= AudioDistance[a, b, Masking -> All]

Out[2]= 8417.93
```

#### PartitionGranularity (2)

Use the ``PartitionGranularity`` option to control the computation of the features:

```wl
In[1]:=
a = ExampleData[{"Audio", "FemaleVoice"}];
b = ExampleData[{"Audio", "MaleVoice"}];

In[2]:= AudioDistance[a, b, PartitionGranularity -> Quantity[46, "Milliseconds"], DistanceFunction -> {EuclideanDistance, FeatureExtractor -> "MFCC"}]

Out[2]= 2441.49

In[3]:= AudioDistance[a, b, PartitionGranularity -> Quantity[92, "Milliseconds"], DistanceFunction -> {EuclideanDistance, FeatureExtractor -> "MFCC"}]

Out[3]= 4564.16
```

---

If the selected feature is ``"AudioData"``, the ``PartitionGranularity`` option is ignored:

```wl
In[1]:=
a = ExampleData[{"Audio", "FemaleVoice"}];
b = ExampleData[{"Audio", "MaleVoice"}];

In[2]:= AudioDistance[a, b, PartitionGranularity -> Quantity[46, "Milliseconds"], DistanceFunction -> {EuclideanDistance, FeatureExtractor -> AudioData}]

Out[2]= 44.5161

In[3]:= AudioDistance[a, b, PartitionGranularity -> Quantity[92, "Milliseconds"], DistanceFunction -> {EuclideanDistance, FeatureExtractor -> AudioData}]

Out[3]= 44.5161
```

#### SampleRate (1)

By default, all audio signals are converted to the higher sample rate:

```wl
In[1]:= AudioDistance[AudioGenerator["Sin"], AudioGenerator["Triangle", SampleRate -> 11025]]

Out[1]= 34.092

In[2]:= AudioDistance[AudioGenerator["Sin"], AudioResample[AudioGenerator["Triangle", SampleRate -> 11025], 44100]]

Out[2]= 34.092
```

Use a specific sample rate:

```wl
In[3]:= AudioDistance[AudioGenerator["Sin"], AudioGenerator["Triangle", SampleRate -> 11025], SampleRate -> 11025]

Out[3]= 17.0458
```

### Applications (1)

Distance between different oscillators:

```wl
In[1]:=
waveforms = {"Sin", "Triangle", "Sawtooth", "Square"};m = DistanceMatrix[AudioGenerator /@ waveforms, DistanceFunction -> AudioDistance];
MatrixPlot[m, FrameTicks -> {Thread[{Range[4], waveforms}]}]

Out[1]= [image]
```

Visually compare the waveforms:

```wl
In[2]:= AudioPlot[AudioGenerator /@ waveforms, PlotRange -> .01]

Out[2]= [image]
```

## See Also

* [`AudioLocalMeasurements`](https://reference.wolfram.com/language/ref/AudioLocalMeasurements.en.md)
* [`SpectrogramArray`](https://reference.wolfram.com/language/ref/SpectrogramArray.en.md)
* [`CepstrogramArray`](https://reference.wolfram.com/language/ref/CepstrogramArray.en.md)
* [`ConformAudio`](https://reference.wolfram.com/language/ref/ConformAudio.en.md)
* [`WarpingDistance`](https://reference.wolfram.com/language/ref/WarpingDistance.en.md)
* [`DistanceMatrix`](https://reference.wolfram.com/language/ref/DistanceMatrix.en.md)
* [`Nearest`](https://reference.wolfram.com/language/ref/Nearest.en.md)
* [`FindClusters`](https://reference.wolfram.com/language/ref/FindClusters.en.md)
* [`Classify`](https://reference.wolfram.com/language/ref/Classify.en.md)

## Related Guides

* [Audio Representation](https://reference.wolfram.com/language/guide/AudioRepresentation.en.md)
* [Audio Analysis](https://reference.wolfram.com/language/guide/AudioAnalysis.en.md)

## History

* [Introduced in 2018 (11.3)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn113.en.md) \| [Updated in 2024 (14.1)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn141.en.md)