---
title: "AudioLocalMeasurements"
language: "en"
type: "Symbol"
summary: "AudioLocalMeasurements[audio,  prop] computes the property  prop locally for partitions of audio. AudioLocalMeasurements[audio, {SubscriptBox[prop, 1], SubscriptBox[prop, 2], ...}] computes several properties SubscriptBox[prop, i]. AudioLocalMeasurements[audio,  prop, format] returns the measurements in the specified output format. AudioLocalMeasurements[video, ...] computes the measurements from the first audio track in video."
keywords: 
- audio measurements
- audio metering
- audio analysis
- audio local measurements
- audio properties
- audio features
- sound measurements
- loudness
- MFCC
- LPC
- audio local power
- audio local measures
- audio analyzer
canonical_url: "https://reference.wolfram.com/language/ref/AudioLocalMeasurements.html"
source: "Wolfram Language Documentation"
related_guides: 
  - 
    title: "Audio Processing"
    link: "https://reference.wolfram.com/language/guide/AudioProcessing.en.md"
  - 
    title: "Signal Processing"
    link: "https://reference.wolfram.com/language/guide/SignalProcessing.en.md"
  - 
    title: "Sound and Sonification"
    link: "https://reference.wolfram.com/language/guide/SoundAndSonification.en.md"
  - 
    title: "Video Computation: Update History"
    link: "https://reference.wolfram.com/language/guide/VideoComputation-UpdateHistory.en.md"
  - 
    title: "Audio Representation"
    link: "https://reference.wolfram.com/language/guide/AudioRepresentation.en.md"
  - 
    title: "Video Analysis"
    link: "https://reference.wolfram.com/language/guide/VideoAnalysis.en.md"
  - 
    title: "Speech Computation"
    link: "https://reference.wolfram.com/language/guide/SpeechComputation.en.md"
  - 
    title: "Signal Visualization & Analysis"
    link: "https://reference.wolfram.com/language/guide/SignalAnalysis.en.md"
  - 
    title: "Audio Analysis"
    link: "https://reference.wolfram.com/language/guide/AudioAnalysis.en.md"
related_functions: 
  - 
    title: "AudioIntervals"
    link: "https://reference.wolfram.com/language/ref/AudioIntervals.en.md"
  - 
    title: "AudioAnnotate"
    link: "https://reference.wolfram.com/language/ref/AudioAnnotate.en.md"
  - 
    title: "PitchRecognize"
    link: "https://reference.wolfram.com/language/ref/PitchRecognize.en.md"
  - 
    title: "SpeechRecognize"
    link: "https://reference.wolfram.com/language/ref/SpeechRecognize.en.md"
  - 
    title: "AudioDistance"
    link: "https://reference.wolfram.com/language/ref/AudioDistance.en.md"
  - 
    title: "AudioPartition"
    link: "https://reference.wolfram.com/language/ref/AudioPartition.en.md"
  - 
    title: "AudioBlockMap"
    link: "https://reference.wolfram.com/language/ref/AudioBlockMap.en.md"
  - 
    title: "AudioMeasurements"
    link: "https://reference.wolfram.com/language/ref/AudioMeasurements.en.md"
---
# AudioLocalMeasurements

AudioLocalMeasurements[audio,"prop"]
computes the property "prop" locally for partitions of audio.
	AudioLocalMeasurements[audio,{"Subscript[prop, 1]","Subscript[prop, 2]",\[Ellipsis]}]
computes several properties "Subscript[prop, i]".
	AudioLocalMeasurements[audio,"prop",format]
returns the measurements in the specified output format.
	AudioLocalMeasurements[video,\[Ellipsis]]
computes the measurements from the first audio track in video.

## Details and Options

* ``AudioLocalMeasurements`` are also known as audio features or descriptors.

* ``AudioLocalMeasurements`` returns a ``TimeSeries`` with measurements returned for each partition.

[image]

* Measurements are computed on the average channel values.

* Basic histogram properties:

|                     |                                     |
| ------------------- | ----------------------------------- |
| "Max"               | maximum value                       |
| "MaxAbs"            | maximum absolute value              |
| "Min"               | minimum value                       |
| "MinAbs"            | minimum absolute value              |
| "MinMax"            | minimum and maximum values          |
| "MinMaxAbs"         | minimum and maximum absolute values |
| "Mean"              | mean value                          |
| "Median"            | median value                        |
| "StandardDeviation" | standard deviation of values        |
| "Total"             | sum of values                       |

* Intensity properties:

|                |                                |
| -------------- | ------------------------------ |
| "Power"        | mean of the squared values     |
| "RMSAmplitude" | root mean square of the values |
| "Loudness"     | an estimated loudness measure  |

* The loudness property uses Stevens's power law, computed using $\text{\textit{power}}^{0.67}$.

* Time domain properties:

|                           |                                            |
| ------------------------- | ------------------------------------------ |
| "CrestFactor"             | maximum divided by the root mean square    |
| "Entropy"                 | entropy of values                          |
| "LPC"                     | linear prediction coefficients             |
| "PeakToAveragePowerRatio" | maximum power divided by the average power |
| "TemporalCentroid"        | temporal centroid of values                |
| "ZeroCrossingRate"        | rate of zero crossings                     |
| "ZeroCrossings"           | number of zero crossings for the partition |

* The ``"LPC"`` property returns 12 coefficients that are estimated using linear predictive coding. Using ``{"LPC", n}``, ``n`` coefficients are returned.

* LPC coefficients are commonly used in analysis and the encoding of speech signals.

* The temporal centroid property gives the center of gravity of the energy of each partition. A temporal centroid of 0.5 means the center of the partition, while 0 and 1 correspond to the beginning and end of the partition.

* Frequency domain properties:

|                        |                                                          |
| ---------------------- | -------------------------------------------------------- |
| "FundamentalFrequency" | estimated fundamental frequency                          |
| "Formants"             | frequencies of the formants of the signal                |
| "HighFrequencyContent" | average of the linearly weighted power spectrum          |
| "MFCC"                 | mel-frequency cepstral coefficients                      |
| "SpectralCentroid"     | centroid of the power spectrum                           |
| "SpectralCrest"        | maximum divided by the mean of the power spectrum        |
| "SpectralFlatness"     | geometric mean divided by the mean of the power spectrum |
| "SpectralKurtosis"     | kurtosis of the magnitude spectrum                       |
| "SpectralRollOff"      | frequency below which most of the energy is concentrated |
| "SpectralSkewness"     | skewness of the magnitude spectrum                       |
| "SpectralSlope"        | estimated slope of the magnitude spectrum                |
| "SpectralSpread"       | measure of the bandwidth of the power spectrum           |

* Using ``{"FundamentalFrequency", thr, minfreq, maxfreq}``, only frequencies detected with confidence of ``thr`` or higher in the frequency range between ``minfreq`` and ``maxfreq`` are returned. The default values are optimized for signals including speech and instruments.

* Using ``{"Formants", n, m}``, up to ``n`` formants are returned using ``m`` LPC coefficients. By default, $n=5$ and ``m`` depends on the input sample rate.

* The MFCC property returns 13 coefficients. Using ``{"MFCC", n, m, minfreq, maxfreq}``, ``n`` coefficients are returned using ``m`` filters in the frequency range between ``minfreq`` and ``maxfreq``.

* Frequency domain properties computed on consecutive partitions:

|                           |                                                         |
| ------------------------- | ------------------------------------------------------- |
| "ComplexDomainDistance"   | distance between predicted and measured Fourier         |
| "ModifiedKullbackLeibler" | modified Kullback–Leibler distance between spectra      |
| "Novelty"                 | estimated measure for significant changes               |
| "PhaseDeviation"          | phase difference between predicted and measured Fourier |
| "SpectralFlux"            | norm of the difference between consecutive spectra      |

* Speech properties:

"VoiceActivity"	whether voice activity is detected (0s and 1s)

* Speaker properties:

|                              |                             |
| ---------------------------- | --------------------------- |
| "SpeechAperiodicity"         | aperiodic (noisy) component |
| "SpeechFundamentalFrequency" | fundamental frequency       |
| "SpeechSpectralEnvelope"     | smoothed spectrogram data   |

* By default, a list of property values is returned. Other ``format`` specifications include:

|               |                                                 |
| ------------- | ----------------------------------------------- |
| Automatic     | determine the output automatically              |
| "Association" | format the result as an Association             |
| "Dataset"     | format the result as a Dataset                  |
| "List"        | format the result as a List                     |
| "RuleList"    | format the result as a list of Rule expressions |

* The following options can be given:

|                       |           |                                              |
| --------------------- | --------- | -------------------------------------------- |
| Alignment             | Center    | alignment of the time stamps with partitions |
| FourierParameters     | {-1, 1}   | Fourier parameters                           |
| Padding               | Automatic | padding scheme                               |
| PaddingSize           | Automatic | amount of padding                            |
| PartitionGranularity  | Automatic | audio partitioning specification             |
| MetaInformation       | None      | include additional metainformation           |
| MissingDataMethod     | None      | method to use for missing values             |
| ResamplingMethod      | Automatic | the method to use for resampling paths       |

* By default, measurements are returned at the center of each partition. Using the ``Alignment`` option, measurements can be returned at the beginning (``Left``) or end (``Right``) of each partition.

* By default, the signal is padded by half of the partition size at both ends with silence. For possible settings for ``Padding``, see the reference page for ``AudioPad``.

## Examples (40)

### Basic Examples (3)

Compute the RMS amplitude of an audio object:

```wl
In[1]:= AudioLocalMeasurements[\!\(\*AudioBox["![Embedded Audio Player](audio://content-76niz)"]\), "RMSAmplitude"]

Out[1]=
TemporalData[TimeSeries, {CompressedData["«1098»"], {{0., 2.391655328798186, 0.023219954648526078}}, 1, 
  {"Continuous", 1}, {"Discrete", 1}, 1, {MetaInformation -> None, MissingDataMethod -> None, 
   ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, False, 
 14.3]
```

Plot the measurement:

```wl
In[2]:= ListLinePlot[%, PlotRange -> All]

Out[2]= [image]
```

---

Compute the RMS amplitude of the first audio track of a ``Video`` object:

```wl
In[1]:= AudioLocalMeasurements[\!\(\*VideoBox["![Video Player: ExampleData/fish.mp4](video://content-2sfji)"]\), "RMSAmplitude"]

Out[1]=
TemporalData[TimeSeries, {CompressedData["«1680»"], {{0., 3.3901133786848074, 0.023219954648526078}}, 1, {"Continuous", 1}, {"Discrete", 1}, 
  1, {MetaInformation -> None, MissingDataMethod -> None, 
   ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, False, 
 14.1]
```

Plot the measurement:

```wl
In[2]:= ListLinePlot[%, PlotRange -> All]

Out[2]= [image]
```

---

Compute multiple measurements:

```wl
In[1]:= AudioLocalMeasurements[\!\(\*AudioBox["![Embedded Audio Player](audio://content-76niz)"]\), {"Power", "RMSAmplitude"}, Association]

Out[1]=
<|"Power" -> TemporalData[TimeSeries, {CompressedData["«1113»"], {{0., 2.391655328798186, 0.023219954648526078}}, 1, 
  {"Continuous", 1}, {"Discrete", 1}, 1, {MetaInformation -> None, MissingDataMethod -> None, 
   ResamplingMethod -> {"Interpolat ... 391655328798186, 0.023219954648526078}}, 1, 
  {"Continuous", 1}, {"Discrete", 1}, 1, {MetaInformation -> None, MissingDataMethod -> None, 
   ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, False, 
 14.3]|>
```

Plot the measurements:

```wl
In[2]:= ListLinePlot[%, PlotRange -> All]

Out[2]= [image]
```

### Scope (24)

#### Basic Uses (1)

Format the output as an ``Association`` :

```wl
In[1]:= a = Import["ExampleData/rule30.wav"];

In[2]:= AudioLocalMeasurements[a, {"RMSAmplitude", "Power"}, "Association"]

Out[2]=
<|"RMSAmplitude" -> TemporalData[TimeSeries, {CompressedData["«931»"], 
  {{0., 1.787936507936508, 0.023219954648526078}}, 1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {MetaInformation -> None, MissingDataMethod -> None, ResamplingMethod -> 
    {" ... .787936507936508, 0.023219954648526078}}, 1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {MetaInformation -> None, MissingDataMethod -> None, ResamplingMethod -> 
    {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, False, 14.3]|>
```

Return a list of ``TimeSeries`` :

```wl
In[3]:= AudioLocalMeasurements[a, {"RMSAmplitude", "Power"}, "List"]

Out[3]=
{TemporalData[TimeSeries, {CompressedData["«930»"], 
  {{0., 1.787936507936508, 0.023219954648526078}}, 1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {MetaInformation -> None, MissingDataMethod -> None, ResamplingMethod -> 
    {"Interpolation", Int ... 1.787936507936508, 0.023219954648526078}}, 1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {MetaInformation -> None, MissingDataMethod -> None, ResamplingMethod -> 
    {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, False, 14.3]}
```

Return a list of rules:

```wl
In[4]:= AudioLocalMeasurements[a, {"RMSAmplitude", "Power"}, "RuleList"]

Out[4]=
{"RMSAmplitude" -> TemporalData[TimeSeries, {CompressedData["«931»"], 
  {{0., 1.787936507936508, 0.023219954648526078}}, 1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {MetaInformation -> None, MissingDataMethod -> None, ResamplingMethod -> 
    {"I ... 1.787936507936508, 0.023219954648526078}}, 1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {MetaInformation -> None, MissingDataMethod -> None, ResamplingMethod -> 
    {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, False, 14.3]}
```

Return a ``Dataset`` :

```wl
In[5]:= AudioLocalMeasurements[a, {"RMSAmplitude", "Power"}, "Dataset"]

Out[5]=
Dataset[Association["RMSAmplitude" -> TemporalData[TimeSeries, 
    {CompressedData["«934»"], {{0., 1.787936507936508, 0.023219954648526078}}, 
     1, {"Continuous", 1}, {"Discrete", 1}, 1, {MetaInformation -> None, MissingDataMethod -> None, 
    ... 6507936508, 0.023219954648526078}}, 1, {"Continuous", 1}, {"Discrete", 1}, 1, 
     {MetaInformation -> None, MissingDataMethod -> None, ResamplingMethod -> 
       {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, False, 14.3]]]
```

#### Histogram Properties (2)

Basic histogram properties:

```wl
In[1]:= a = Import["ExampleData/rule30.wav"];

In[2]:= ListLinePlot[AudioLocalMeasurements[a, {"Max", "MaxAbs", "Min", "MinAbs"}]]

Out[2]= [image]

In[3]:= ListLinePlot[AudioLocalMeasurements[a, "Total"]]

Out[3]= [image]
```

---

Statistical histogram properties:

```wl
In[1]:= a = Import["ExampleData/rule30.wav"];

In[2]:= ListLinePlot[AudioLocalMeasurements[a, {"Mean", "Median", "StandardDeviation"}]]

Out[2]= [image]
```

#### Intensity Properties (1)

Basic properties based on intensity:

```wl
In[1]:= a = Import["ExampleData/rule30.wav"];

In[2]:= AudioLocalMeasurements[a, {"Power", "RMSAmplitude", "Loudness"}]//ListLinePlot[#, PlotRange -> All]&

Out[2]= [image]
```

#### Time Domain Properties (6)

The ``"CrestFactor``" property measures the ratio of the maximum and the RMS on the partitions. ``"PeakToAveragePowerRatio"`` computes the same value squared:

```wl
In[1]:= a = Import["ExampleData/rule30.wav"];

In[2]:= AudioLocalMeasurements[a, {"CrestFactor", "PeakToAveragePowerRatio"}]//ListLinePlot[#, PlotRange -> All]&

Out[2]= [image]
```

---

The ``"TemporalCentroid"`` property computes the center of gravity of the energy distribution of each partition:

```wl
In[1]:= a = Import["ExampleData/rule30.wav"];

In[2]:= ListLinePlot[AudioLocalMeasurements[a, "TemporalCentroid"], PlotRange -> All]

Out[2]= [image]
```

The output value is bound between 0 and 1, where 0 means that all the energy is concentrated at the beginning of the partition.

---

``"ZeroCrossings"`` returns the number of zero crossings in a partition; ``"ZeroCrossingRate"`` normalizes it with the duration of the partition:

```wl
In[1]:= a = Import["ExampleData/rule30.wav"];

In[2]:= ListLinePlot[AudioLocalMeasurements[a, "ZeroCrossingRate"], PlotRange -> All]

Out[2]= [image]
```

---

The ``"LPC"`` property returns 12 coefficients that are estimated using linear predictive coding:

```wl
In[1]:= a = Import["ExampleData/rule30.wav"];

In[2]:= AudioLocalMeasurements[a, "LPC"]["Values"]//Transpose//ArrayPlot

Out[2]= [image]
```

Control the number of LPC coefficients for audio objects with high sample rates:

```wl
In[3]:= AudioLocalMeasurements[a, {"LPC", 66}]["Values"]//Transpose//ArrayPlot

Out[3]= [image]
```

---

Extract the frequencies of the formants of a signal:

```wl
In[1]:=
a = ExampleData[{"Audio", "MaleVoice"}];
formants = AudioLocalMeasurements[a, "Formants"]

Out[1]=
TemporalData[TimeSeries, {CompressedData["«4996»"], 
  {{0., 2.391655328798186, 0.023219954648526078}}, 1, {"Continuous", 1}, {"Discrete", 1}, 5, 
  {MetaInformation -> None, MissingDataMethod -> None, ResamplingMethod -> 
    {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 5}}, False, 14.3]

In[2]:= Show[Spectrogram[a, AspectRatio -> 1, PlotRange -> {All, {0, 7000}}], formants//ListLinePlot]

Out[2]= [image]
```

Control the number of formants and LPC coefficients used for the calculation:

```wl
In[3]:= formants = AudioLocalMeasurements[a, {"Formants", 2, 40}]

Out[3]=
TemporalData[TimeSeries, {CompressedData["«2088»"], {{0., 2.391655328798186, 0.023219954648526078}}, 1, {"Continuous", 1}, 
  {"Discrete", 1}, 2, {MetaInformation -> None, MissingDataMethod -> None, 
   ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 2}}, False, 
 14.3]

In[4]:= Show[Spectrogram[a, AspectRatio -> 1, PlotRange -> {All, {0, 7000}}], formants//ListLinePlot]

Out[4]= [image]
```

---

The entropy of the audio signal:

```wl
In[1]:= a = Import["ExampleData/rule30.wav"];

In[2]:= AudioLocalMeasurements[a, "Entropy"]//ListLinePlot[#, PlotRange -> All]&

Out[2]= [image]
```

#### Frequency Domain Properties (8)

``"SpectralCrest"`` measures the ratio between the maximum and the mean of the power spectrum:

```wl
In[1]:= a = ExampleData[{"Audio", "PianoScale"}];

In[2]:= AudioLocalMeasurements[a, {"SpectralCrest"}]//ListLinePlot

Out[2]= [image]
```

---

``"SpectralRollOff"`` measures the frequency below which 95% of the energy of the spectrum is concentrated:

```wl
In[1]:= a = ExampleData[{"Audio", "PianoScale"}];

In[2]:= AudioLocalMeasurements[a, {"SpectralRollOff"}]//ListLinePlot

Out[2]= [image]
```

---

``"SpectralSlope"`` is a measure of the slope of the power spectrum:

```wl
In[1]:= a = ExampleData[{"Audio", "PianoScale"}];

In[2]:= AudioLocalMeasurements[a, {"SpectralSlope"}]//ListLinePlot

Out[2]= [image]
```

---

``"SpectralFlatness"`` is a measure of the flatness of the power spectrum:

```wl
In[1]:= a = ExampleData[{"Audio", "PianoScale"}];

In[2]:= AudioLocalMeasurements[a, {"SpectralFlatness"}]//ListLinePlot

Out[2]= [image]
```

---

Common statistical properties computed on the power spectrum:

```wl
In[1]:= a = ExampleData[{"Audio", "PianoScale"}];

In[2]:= AudioLocalMeasurements[a, {"SpectralCentroid", "SpectralKurtosis", "SpectralSpread"}]//ListLinePlot[#, PlotRange -> All]&

Out[2]= [image]
```

---

The ``"FundamentalFrequency"`` estimates the fundamental frequency of monophonic sounds:

```wl
In[1]:= a = ExampleData[{"Audio", "PianoScale"}];

In[2]:= AudioLocalMeasurements[a, "FundamentalFrequency"]//ListLinePlot

Out[2]= [image]
```

Control the sensitivity of the detection:

```wl
In[3]:= AudioLocalMeasurements[a, {"FundamentalFrequency", .2}]//ListLinePlot

Out[3]= [image]
```

Control the frequency range on which the detection is performed:

```wl
In[4]:= AudioLocalMeasurements[a, {"FundamentalFrequency", .2, 100, 700}]//ListLinePlot

Out[4]= [image]
```

---

``"HighFrequencyContent"`` computes the average of the power spectrum using weights that increase linearly with frequency:

```wl
In[1]:= a = ExampleData[{"Audio", "PianoScale"}];

In[2]:= AudioLocalMeasurements[a, "HighFrequencyContent"]//ListLinePlot[#, PlotRange -> All]&

Out[2]= [image]
```

The linear weighting of the spectrum assigns more importance to events happening in the higher end of the spectrum, making ``"HighFrequencyContent"`` a good candidate for transient detection.

---

The ``"MFCC"`` property returns 12 coefficients of the mel-frequency cepstrum:

```wl
In[1]:= a = ExampleData[{"Audio", "PianoScale"}];

In[2]:= AudioLocalMeasurements[a, "MFCC"]["Values"]//Transpose//ListDensityPlot

Out[2]= [image]
```

Control the number of coefficients and number of filters, as well as the frequency range:

```wl
In[3]:= AudioLocalMeasurements[a, {"MFCC", 30, 40, Quantity[40, "Hertz"], Quantity[30000, "Radians"/"Seconds"]}]["Values"]//Transpose//ListDensityPlot

Out[3]= [image]
```

#### Frequency Domain Properties Computed on Neighboring Partitions (2)

Properties based on different measures of the distance between Fourier transforms of two consecutive frames:

```wl
In[1]:= a = ExampleData[{"Audio", "PianoScale"}];

In[2]:= Rescale /@ AudioLocalMeasurements[a, {"ComplexDomainDistance", "ModifiedKullbackLeibler", "PhaseDeviation", "SpectralFlux"}]//ListLinePlot

Out[2]= [image]
```

---

The ``"Novelty"`` property computes how much a frame is different from the neighboring ones:

```wl
In[1]:= a = ExampleData[{"Audio", "PianoScale"}];

In[2]:= ListLinePlot[AudioLocalMeasurements[a, "Novelty"], PlotRange -> All]

Out[2]= [image]
```

#### Speech & Speaker Properties (4)

The ``"VoiceActivity"`` property is an indicator function of voiced sections of a speech signal:

```wl
In[1]:=
a = ExampleData[{"Audio", "MaleVoice"}];
va = AudioLocalMeasurements[a, "VoiceActivity"]

Out[1]=
TemporalData[TimeSeries, {{{0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  ...  2.391655328798186, 0.023219954648526078}}, 1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {MetaInformation -> None, MissingDataMethod -> None, ResamplingMethod -> 
    {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, False, 14.3]
```

Show the voice activity with the audio waveform plot:

```wl
In[2]:= Show[AudioPlot[a, PlotLayout -> "Averaged"], ListLinePlot[va, PlotStyle -> Red], ImageSize -> Medium]

Out[2]= [image]
```

Use a smaller 10-millisecond window to increase the resolution:

```wl
In[3]:=
va = AudioLocalMeasurements[a, "VoiceActivity", PartitionGranularity -> .01];
Show[AudioPlot[a, PlotLayout -> "Averaged"], ListLinePlot[va, PlotStyle -> Red], ImageSize -> Medium]

Out[3]= [image]
```

---

The ``"SpeechFundamentalFrequency"`` property estimates the fundamental frequency of speech:

```wl
In[1]:= a = ExampleData[{"Audio", "MaleVoice"}];

In[2]:= ListLinePlot[AudioLocalMeasurements[a, "SpeechFundamentalFrequency", PartitionGranularity -> {Quantity[25, "Milliseconds"], Quantity[5, "Milliseconds"]}], PlotRange -> All]

Out[2]= [image]
```

---

The ``"SpeechSpectralEnvelope"`` property returns the coefficients of the spectral envelope of the signal:

```wl
In[1]:=
a = ExampleData[{"Audio", "MaleVoice"}];
se = AudioLocalMeasurements[a, "SpeechSpectralEnvelope"]

Out[1]= TemporalData[«4»]
```

Plot the values of the result:

```wl
In[2]:= MatrixPlot[Transpose@Log[Values[se]], DataReversed -> True, AspectRatio -> 1]

Out[2]= [image]
```

---

The ``"SpeechAperiodicity"`` property returns the coefficients of the aperiodic component of the signal:

```wl
In[1]:=
a = ExampleData[{"Audio", "MaleVoice"}];
ap = AudioLocalMeasurements[a, "SpeechAperiodicity"]

Out[1]=
TemporalData[TimeSeries, {CompressedData["«228310»"], {{0., 2.391655328798186, 0.023219954648526078}}, 1, 
  {"Continuous", 1}, {"Discrete", 1}, 1025, {MetaInformation -> None, MissingDataMethod -> None, 
   ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1025}}, 
 False, 14.3]
```

Plot the values of the result:

```wl
In[2]:= MatrixPlot[Transpose@Log[Values[ap]], DataReversed -> True, AspectRatio -> 1]

Out[2]= [image]
```

### Options (5)

#### Alignment (1)

The time stamps of the resulting ``TimeSeries`` are by default placed in the center of each partition:

```wl
In[1]:=
a = Import["ExampleData/rule30.wav"];
audioplot = AudioPlot[a, AspectRatio -> 1 / 2, PlotRange -> All, PlotRangePadding -> {.15, .05}, GridLines -> {Table[i, {i, -.15, QuantityMagnitude@Duration@a, .3}], None}];
res = AudioLocalMeasurements[a, "Max", PartitionGranularity -> {.3, .3}];

In[2]:=
Show[audioplot, 
	ListPlot[res, PlotStyle -> Red, Filling -> Axis, PlotMarkers -> Automatic, PlotRange -> {-1, 1}]]

Out[2]= [image]
```

Use ``Alignment -> Right`` to place the computed property at the end of each partition:

```wl
In[3]:=
res = AudioLocalMeasurements[a, "Max", PartitionGranularity -> {.3, .3}, Alignment -> Right];
Show[audioplot, 
	ListPlot[res, PlotStyle -> Red, Filling -> Axis, PlotMarkers -> Automatic, PlotRange -> {-1, 1}]]

Out[3]= [image]
```

#### Padding (1)

By default, ``"Silent"`` padding is used:

```wl
In[1]:= a = Import["ExampleData/rule30.wav"];

In[2]:= AudioLocalMeasurements[a, "Max", PaddingSize -> 1]//ListLinePlot

Out[2]= [image]
```

Use ``"Reversed"`` padding:

```wl
In[3]:= AudioLocalMeasurements[a, "Max", Padding -> "Reversed", PaddingSize -> 1]//ListLinePlot

Out[3]= [image]
```

#### PaddingSize (1)

By default, padding equal to half the partition size is applied at both ends of the signal:

```wl
In[1]:= a = Import["ExampleData/rule30.wav"];

In[2]:= ListLinePlot[AudioLocalMeasurements[a, "Power"]]

Out[2]= [image]
```

Increase the amount of padding:

```wl
In[3]:= ListLinePlot[AudioLocalMeasurements[a, "Power", PaddingSize -> 1], PlotRange -> All]

Out[3]= [image]
```

Use different padding amounts at the beginning and end of the signal:

```wl
In[4]:= ListLinePlot[AudioLocalMeasurements[a, "Power", PaddingSize -> {0, 2}], PlotRange -> All]

Out[4]= [image]
```

#### PartitionGranularity (2)

Specify a partition size of 100 ms:

```wl
In[1]:= a = Import["ExampleData/rule30.wav"];

In[2]:= AudioLocalMeasurements[a, "Power", PartitionGranularity -> Quantity[100, "Milliseconds"]]//ListLinePlot

Out[2]= [image]
```

Use an offset of 10 ms:

```wl
In[3]:= AudioLocalMeasurements[a, "Power", PartitionGranularity -> {Quantity[100, "Milliseconds"], Quantity[10, "Milliseconds"]}]//ListLinePlot

Out[3]= [image]
```

Use a smoothing window:

```wl
In[4]:= AudioLocalMeasurements[a, "Power", PartitionGranularity -> {Quantity[100, "Milliseconds"], Quantity[10, "Milliseconds"], HannWindow}]//ListLinePlot

Out[4]= [image]
```

---

All frequency domain properties use a smoothing window by default:

```wl
In[1]:= a = Import["ExampleData/rule30.wav"];

In[2]:= ListLinePlot[{AudioLocalMeasurements[a, "SpectralFlux", PartitionGranularity -> {.05, .01, Automatic}], AudioLocalMeasurements[a, "SpectralFlux", PartitionGranularity -> {.05, .01, None}]}, PlotLegends -> {"HannWindow", "None"}]

Out[2]= [image]
```

### Applications (4)

Detect the transients in a complex audio signal:

```wl
In[1]:= a = \!\(\*AudioBox["![Embedded Audio Player](audio://content-9ghc6)"]\);
```

Compute a "detection function" by averaging several measurements from the original signal:

```wl
In[2]:=
properties = {"ComplexDomainDistance", "HighFrequencyContent", "ModifiedKullbackLeibler", "Novelty", "PhaseDeviation"};detectionFunctions = Rescale /@ AudioLocalMeasurements[a, properties, PartitionGranularity -> {.02, .005}];
detectionFunction = Mean[Lookup[detectionFunctions, {"HighFrequencyContent", "ModifiedKullbackLeibler", "Novelty"}]]

Out[2]=
TemporalData[TimeSeries, {CompressedData["«14852»"], 
  {{0., 6.984126984126984, 0.004988662131519274}}, 1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {MetaInformation -> None, MissingDataMethod -> None, ResamplingMethod -> 
    {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, False, 14.3]
```

Filter the detection function using an adaptive threshold:

```wl
In[3]:= filteredDetectionFunction = (detectionFunction - MedianFilter[detectionFunction, .02])

Out[3]=
TemporalData[TimeSeries, {CompressedData["«11516»"], {{0., 6.984126984126984, 0.004988662131519274}}, 
  1, {"Continuous", 1}, {"Discrete", 1}, 1, {MetaInformation -> None, MissingDataMethod -> None, 
   ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1, Method -> "Spline"}, 
   TemporalRegularity -> True, ValueDimensions -> 1}}, False, 14.3]
```

Find the peaks of the filtered detection function:

```wl
In[4]:=
peaks = FindPeaks[filteredDetectionFunction, 0, 0, 0.07];
ListLinePlot[filteredDetectionFunction, PlotRange -> All, ImageSize -> Medium, Epilog -> {Red, PointSize[0.02], Point[peaks//Normal]}]

Out[4]= [image]
```

Plot the detected transient on the waveform:

```wl
In[5]:= AudioPlot[a, ColorFunction -> Function[{x, y}, If[AnyTrue[peaks["Times"], Abs[x - #] < .015&], RGBColor[1, 0.2, 0.2], RGBColor[0.368417, 0.506779, 0.709798]]], PlotRange -> {All, All}, ImageSize -> Medium, FillingStyle -> Opacity[.9]]

Out[5]= [image]
```

---

Compute a signature for an audio object:

```wl
In[1]:= a = Audio["http://exampledata.wolfram.com/bach.mp3"]

Out[1]= \!\(\*AudioBox["![Audio Player: http://exampledata.wolfram.com/bach.mp3](audio://content-og07s)"]\)
```

Compute the MFCC feature and extract the values:

```wl
In[2]:= mfcc = AudioLocalMeasurements[AudioResample[a, 11025], "LPC", PartitionGranularity -> {.5, .25}]["Values"];
```

Plot the resulting distance matrix:

```wl
In[3]:= MatrixPlot[DistanceMatrix[mfcc], DataRange -> {{0, QuantityMagnitude[Duration@a, "s"]}, {0, QuantityMagnitude[Duration@a, "s"]}}, ImageSize -> 300, FrameTicks -> {Automatic, Automatic}]

Out[3]= [image]
```

---

Compare two recordings of the same sentence using dynamic time warping:

```wl
In[1]:= alice = {\!\(\*AudioBox["![Embedded Audio Player](audio://content-kk35x)"]\), \!\(\*AudioBox["![Embedded Audio Player](audio://content-fvmpy)"]\)};
```

Compute and plot the MFCC features for the recordings:

```wl
In[2]:= mfcc = AudioLocalMeasurements[#, "MFCC", PartitionGranularity -> {.05, .01}]["Values"]& /@ alice;

In[3]:= Column[MatrixPlot[#, PlotTheme -> "Minimal", MaxPlotPoints -> 2000, AspectRatio -> 1 / 10, ImageSize -> Medium]& /@ Transpose /@ mfcc]

Out[3]= [image]
```

Compute the dynamic time warping correspondence between two of the recordings using ``WarpingCorrespondence`` :

```wl
In[4]:= {n, m} = WarpingCorrespondence[mfcc[[1]], mfcc[[2]]];
```

Plot the correspondence between the two recordings:

```wl
In[5]:=
dur = QuantityMagnitude[Duration[alice[[1]]], "s"];
s = {n, m}\[Transpose] / Max[{n, m}]dur;
Labeled[
	ListLinePlot[
	s, AspectRatio -> 1, PlotStyle -> Thickness[.01], ImageSize -> Medium, Prolog -> {RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], {Line[{{#[[1]], 0}, #}], Line[{{0, #[[2]]}, #}]}& /@ (s[[ ;;  ;; 100]])}
	], 
	AudioPlot[#, PlotStyle -> RGBColor[0.560181, 0.691569, 0.194885], Frame -> False, Axes -> False, ImageSize -> Medium, AspectRatio -> 1 / 15]& /@ alice, {Bottom, Left}, RotateLabel -> True, Spacings -> {0, 0}]

Out[5]= Labeled[[image], [image][image]]
```

---

Use the ``"MFCC"`` measurement as a feature to compute the distance between various elements of the ``ExampleData["Audio"]`` collection:

```wl
In[1]:=
list = Select[ExampleData["Audio"], ExampleData[#, "Duration"] < 10&];
a = ConformAudio[AudioNormalize@AudioChannelMix[#, 1]& /@ ExampleData[#, "Audio"]& /@ list, SampleRate -> 11025];

In[2]:= mfcc = AudioLocalMeasurements[#, "MFCC", PartitionGranularity -> {.05, .01}]["Values"]& /@ a;

In[3]:= ticks = Thread[{Range[Length@list], Text /@ list[[All, 2]]}];MatrixPlot[DistanceMatrix[mfcc], ImageSize -> Medium, FrameTicks -> {ticks, Apply[Rotate[#, Pi / 2]&, ticks, {2}]}]

Out[3]= [image]
```

### Possible Issues (1)

``"FundamentalFrequency"`` returns a ``Missing[]`` value for partitions in which the fundamental frequency cannot be estimated (the frame may contain silence or polyphonic sounds):

```wl
In[1]:= a = ExampleData[{"Audio", "PianoScale"}, "Audio"];

In[2]:= AudioLocalMeasurements[a, "FundamentalFrequency"][0.]

Out[2]= Missing[]
```

The fundamental frequency of a polyphonic sound is not defined:

```wl
In[3]:= AudioLocalMeasurements[Import["ExampleData/rule30.wav"], "FundamentalFrequency"]["Values"]//Short

Out[3]//Short= {Missing[], Missing[], 92.503, Missing[], «31», Missing[], Missing[], Missing[], Missing[]}
```

### Neat Examples (3)

Replicate frequency and amplitude of a flute note using ``AudioGenerator`` :

```wl
In[1]:= a = \!\(\*AudioBox["![Embedded Audio Player](audio://content-llour)"]\);
```

Compute the ``"RMSAmplitude"`` and ``"FundamentalFrequency"`` measurements:

```wl
In[2]:= m = AudioLocalMeasurements[a, {"RMSAmplitude", "FundamentalFrequency"}, MissingDataMethod -> {"Interpolation", InterpolationOrder -> 0}];
```

Use the ``"FundamentalFrequency"`` measurement to control the frequency of the result:

```wl
In[3]:= AudioGenerator[{"Sin", m["FundamentalFrequency"]}]

Out[3]= \!\(\*AudioBox["![Embedded Audio Player](audio://content-bub1s)"]\)
```

Use the ``"RMSAmplitude"`` measurement to control the amplitude:

```wl
In[4]:= %×AudioGenerator[m["RMSAmplitude"]]

Out[4]= \!\(\*AudioBox["![Embedded Audio Player](audio://content-50vzr)"]\)
```

---

Decode a Morse signal:

```wl
In[1]:= morse = \!\(\*AudioBox["![Embedded Audio Player](audio://content-gycdz)"]\);
```

Calculate the RMS amplitude of the signal and round it:

```wl
In[2]:=
rms = AudioLocalMeasurements[morse, "RMSAmplitude", PartitionGranularity -> {.01, .002}];
rounded = Round[rms / Max@rms];
ListLinePlot[rounded]

Out[2]= [image]
```

Select only the points where there is a transient:

```wl
In[3]:=
crossings = TimeSeriesInsert[TimeSeries[CrossingDetect[rounded["Values"] - .5, CornerNeighbors -> True], {rounded["Times"]}], {0, 1}];
transients = TimeSeries@Select[Normal@crossings, #[[2]] == 1&];
```

Make sure that the first point is at ``t = 0`` and compute the minimum time increment:

```wl
In[4]:=
shifted = TimeSeriesShift[transients, -transients["FirstTime"]];
dit = MinimumTimeIncrement[shifted];
```

Define the Morse code mappings:

```wl
In[5]:= code = <|".-" -> "a", "-..." -> "b", "-.-." -> "c", "-.." -> "d", "." -> "e", "..-." -> "f", "--." -> "g", "...." -> "h", ".." -> "i", ".---" -> "j", "-.-" -> "k", ".-.." -> "l", "--" -> "m", "-." -> "n", "---" -> "o", ".--." -> "p", "--.-" -> "q", ".-." -> "r", "..." -> "s", "-" -> "t", "..-" -> "u", "...-" -> "v", ".--" -> "w", "-..-" -> "x", "-.--" -> "y", "--.." -> "z", ".----" -> "1", "..---" -> "2", "...--" -> "3", "....-" -> "4", "....." -> "5", "-...." -> "6", "--..." -> "7", "---.." -> "8", "----." -> "9", "-----" -> "0", ".-.-.-" -> ".", "--..--" -> ",", "-.-.--" -> "!", "..--.." -> "?", "_" -> " "|>;
```

Decode the signal:

```wl
In[6]:= StringJoin[StringSplit[StringJoin[Table[{Differences[shifted["Times"]][[i]], Mod[i, 2]}, {i, Length@Differences[shifted["Times"]]}] /. {{x_, 1} /; .5dit < x < 1.5dit -> ".", {x_, 1} /; 2.5dit < x < 3.5dit -> "-", {x_, 0} /; 2.5dit < x < 3.5dit -> "/", {x_, 0} /; .5dit < x < 1.5dit -> Nothing, {x_, 0} /; 5dit < x < 12dit -> "/_/"}], "/"] /. Normal[code]]

Out[6]= "hello world"
```

---

Create a 3D-printable model of the waveform of an audio object:

```wl
In[1]:=
a = ExampleData[{"Audio", "Drums"}];
AudioPlot[a, PlotRange -> All]

Out[1]= [image]

In[2]:= dur = QuantityMagnitude[Duration@a, "s"]

Out[2]= 4.72499
```

Compute the ``"Min"`` and ``"Max"`` measurements:

```wl
In[3]:=
aspectRatio = 1 / 3;
numPoints = 100;
minmax  = Normal[aspectRatio dur AudioLocalMeasurements[a, #, PartitionGranularity -> {dur / numPoints, dur / numPoints}]]& /@ {"Min", "Max"};
```

Create a 3D model of the waveform:

```wl
In[4]:= RegionProduct[Polygon[Join[minmax[[1]], Reverse[minmax[[2]]]]], Line[{{0}, {0.1 dur}}]]

Out[4]= [image]
```

3D print the model:

```wl
In[5]:= Printout3D[%, "waveform.stl"]
```

## See Also

* [`AudioIntervals`](https://reference.wolfram.com/language/ref/AudioIntervals.en.md)
* [`AudioAnnotate`](https://reference.wolfram.com/language/ref/AudioAnnotate.en.md)
* [`PitchRecognize`](https://reference.wolfram.com/language/ref/PitchRecognize.en.md)
* [`SpeechRecognize`](https://reference.wolfram.com/language/ref/SpeechRecognize.en.md)
* [`AudioDistance`](https://reference.wolfram.com/language/ref/AudioDistance.en.md)
* [`AudioPartition`](https://reference.wolfram.com/language/ref/AudioPartition.en.md)
* [`AudioBlockMap`](https://reference.wolfram.com/language/ref/AudioBlockMap.en.md)
* [`AudioMeasurements`](https://reference.wolfram.com/language/ref/AudioMeasurements.en.md)

## Related Guides

* [Audio Processing](https://reference.wolfram.com/language/guide/AudioProcessing.en.md)
* [Signal Processing](https://reference.wolfram.com/language/guide/SignalProcessing.en.md)
* [Sound and Sonification](https://reference.wolfram.com/language/guide/SoundAndSonification.en.md)
* [Video Computation: Update History](https://reference.wolfram.com/language/guide/VideoComputation-UpdateHistory.en.md)
* [Audio Representation](https://reference.wolfram.com/language/guide/AudioRepresentation.en.md)
* [Video Analysis](https://reference.wolfram.com/language/guide/VideoAnalysis.en.md)
* [Speech Computation](https://reference.wolfram.com/language/guide/SpeechComputation.en.md)
* [Signal Visualization & Analysis](https://reference.wolfram.com/language/guide/SignalAnalysis.en.md)
* [Audio Analysis](https://reference.wolfram.com/language/guide/AudioAnalysis.en.md)

## History

* [Introduced in 2016 (11.0)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn110.en.md) \| [Updated in 2017 (11.1)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn111.en.md) ▪ [2020 (12.1)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn121.en.md) ▪ [2024 (14.1)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn141.en.md)