AudioLocalMeasurements
AudioLocalMeasurements[audio,"prop"]
computes the property "prop" locally for partitions of audio.
AudioLocalMeasurements[audio,{"prop_{1}","prop_{2}",…}]
computes several properties "prop_{i}".
AudioLocalMeasurements[audio,"prop",format]
returns the measurements in the specified output format.
Details and Options
 AudioLocalMeasurements are also known as audio features or descriptors.
 AudioLocalMeasurements returns a TimeSeries with measurements returned for each partition.
 Measurements are computed on the average channel values.
 Basic histogram properties:

"Max" maximum value "MaxAbs" maximum absolute value "Min" minimum value "MinAbs" minimum absolute value "MinMax" minimum and maximum values "MinMaxAbs" minimum and maximum absolute values "Mean" mean value "Median" median value "StandardDeviation" standard deviation of values "Total" sum of values  Intensity properties:

"Power" mean of the squared values "RMSAmplitude" root mean square of the values "Loudness" an estimated loudness measure  The loudness property uses Stevens's power law, computed using .
 Time domain properties:

"CrestFactor" maximum divided by the root mean square "Entropy" entropy of values "LPC" linear prediction coefficients "PeakToAveragePowerRatio" maximum power divided by the average power "TemporalCentroid" temporal centroid of values "ZeroCrossingRate" rate of zero crossings "ZeroCrossings" number of zero crossings for the partition  The "LPC" property returns 12 coefficients that are estimated using linear predictive coding. Using {"LPC",n}, n coefficients are returned.
 LPC coefficients are commonly used in analysis and the encoding of speech signals.
 The temporal centroid property gives the center of gravity of the energy of each partition. A temporal centroid of 0.5 means the center of the partition, while 0 and 1 correspond to the beginning and end of the partition.
 Frequency domain properties:

"FundamentalFrequency" estimated fundamental frequency "Formants" frequencies of the formants of the signal "HighFrequencyContent" average of the linearly weighted power spectrum "MFCC" melfrequency cepstral coefficients "SpectralCentroid" centroid of the power spectrum "SpectralCrest" maximum divided by the mean of the power spectrum "SpectralFlatness" geometric mean divided by the mean of the power spectrum "SpectralKurtosis" kurtosis of the magnitude spectrum "SpectralRollOff" frequency below which most of the energy is concentrated "SpectralSkewness" skewness of the magnitude spectrum "SpectralSlope" estimated slope of the magnitude spectrum "SpectralSpread" measure of the bandwidth of the power spectrum  Using {"FundamentalFrequency",thr,minfreq,maxfreq}, only frequencies detected with confidence of thr or higher in the frequency range between minfreq and maxfreq are returned. The default values are optimized for signals including speech and instruments.
 Using {"Formants",n,m}, up to n formants are returned using m LPC coefficients. By default, and m depends on the input sample rate.
 The MFCC property returns 13 coefficients. Using {"MFCC",n,m,minfreq,maxfreq}, n coefficients are returned using m filters in the frequency range between minfreq and maxfreq.
 Frequency domain properties computed on consecutive partitions:

"ComplexDomainDistance" distance between predicted and measured Fourier "ModifiedKullbackLeibler" modified Kullback–Leibler distance between spectra "Novelty" estimated measure for significant changes "PhaseDeviation" phase difference between predicted and measured Fourier "SpectralFlux" norm of the difference between consecutive spectra  Speech properties:

"VoiceActivity" whether voice activity is detected (0s and 1s)  Speaker properties:

"SpeechAperiodicity" aperiodic (noisy) component "SpeechFundamentalFrequency" fundamental frequency "SpeechSpectralEnvelope" smoothed spectrogram data  By default, a list of property values is returned. Other format specifications include:

Automatic determine the output automatically "Association" format the result as an Association "Dataset" format the result as a Dataset "List" format the result as a List "RuleList" format the result as a list of Rule expressions  The following options can be given:

Alignment Center alignment of the time stamps with partitions FourierParameters {1,1} Fourier parameters Padding Automatic padding scheme PaddingSize Automatic amount of padding PartitionGranularity Automatic audio partitioning specification MetaInformation None include additional metainformation MissingDataMethod None method to use for missing values ResamplingMethod Automatic the method to use for resampling paths  By default, measurements are returned at the center of each partition. Using the Alignment option, measurements can be returned at the beginning (Left) or end (Right) of each partition.
 By default, the signal is padded by half of the partition size at both ends with silence. For possible settings for Padding, see the reference page for AudioPad.
Examples
open allclose allBasic Examples (2)
Scope (24)
Basic Uses (1)
Time Domain Properties (6)
The "CrestFactor" property measures the ratio of the maximum and the RMS on the partitions. "PeakToAveragePowerRatio" computes the same value squared:
The "TemporalCentroid" property computes the center of gravity of the energy distribution of each partition:
The output value is bound between 0 and 1, where 0 means that all the energy is concentrated at the beginning of the partition.
"ZeroCrossings" returns the number of zero crossings in a partition; "ZeroCrossingRate" normalizes it with the duration of the partition:
The "LPC" property returns 12 coefficients that are estimated using linear predictive coding:
Control the number of LPC coefficients for audio objects with high sample rates:
Extract the frequencies of the formants of a signal:
Control the number of formants and LPC coefficients used for the calculation:
Frequency Domain Properties (8)
"SpectralCrest" measures the ratio between the maximum and the mean of the power spectrum:
"SpectralRollOff" measures the frequency below which 95% of the energy of the spectrum is concentrated:
"SpectralSlope" is a measure of the slope of the power spectrum:
"SpectralFlatness" is a measure of the flatness of the power spectrum:
Common statistical properties computed on the power spectrum:
The "FundamentalFrequency" estimates the fundamental frequency of monophonic sounds:
Control the sensitivity of the detection:
Control the frequency range on which the detection is performed:
"HighFrequencyContent" computes the average of the power spectrum using weights that increase linearly with frequency:
The linear weighting of the spectrum assigns more importance to events happening in the higher end of the spectrum, making "HighFrequencyContent" a good candidate for transient detection.
The "MFCC" property returns 12 coefficients of the melfrequency cepstrum:
Control the number of coefficients and number of filters, as well as the frequency range:
Frequency Domain Properties Computed on Neighboring Partitions (2)
Speech & Speaker properties (4)
The "VoiceActivity" property is an indicator function of voiced sections of a speech signal:
Show the voice activity with the audio waveform plot:
Use a smaller 10millisecond window to increase the resolution:
The "SpeechFundamentalFrequency" property estimates the fundamental frequency of speech:
The "SpeechSpectralEnvelope" property returns the coefficients of the spectral envelope of the signal:
Plot the values of the result:
The "SpeechAperiodicity" property returns the coefficients of the aperiodic component of the signal:
Options (5)
Alignment (1)
The time stamps of the resulting TimeSeries are by default placed in the center of each partition:
Use Alignment>Right to place the computed property at the end of each partition:
PaddingSize (1)
Applications (4)
Detect the transients in a complex audio signal:
Compute a "detection function" by averaging several measurements from the original signal:
Filter the detection function using an adaptive threshold:
Find the peaks of the filtered detection function:
Plot the detected transient on the waveform:
Compute a signature for an audio object:
Compute the MFCC feature and extract the values:
Plot the resulting distance matrix:
Compare two recordings of the same sentence using dynamic time warping:
Compute and plot the MFCC features for the recordings:
Compute the dynamic time warping correspondence between two of the recordings using WarpingCorrespondence:
Plot the correspondence between the two recordings:
Use the "MFCC" measurement as a feature to compute the distance between various elements of the ExampleData["Audio"] collection:
Possible Issues (1)
"FundamentalFrequency" returns a Missing[] value for partitions in which the fundamental frequency cannot be estimated (the frame may contain silence or polyphonic sounds):
The fundamental frequency of a polyphonic sound is not defined:
Neat Examples (3)
Replicate frequency and amplitude of a flute note using AudioGenerator:
Compute the "RMSAmplitude" and "FundamentalFrequency" measurements:
Use the "FundamentalFrequency" measurement to control the frequency of the result:
Use the "RMSAmplitude" measurement to control the amplitude:
Calculate the RMS amplitude of the signal and round it:
Select only the points where there is a transient:
Make sure that the first point is at t=0 and compute the minimum time increment:
Define the Morse code mappings:
Create a 3Dprintable model of the waveform of an audio object:
Compute the "Min" and "Max" measurements:
Text
Wolfram Research (2016), AudioLocalMeasurements, Wolfram Language function, https://reference.wolfram.com/language/ref/AudioLocalMeasurements.html (updated 2020).
BibTeX
BibLaTeX
CMS
Wolfram Language. 2016. "AudioLocalMeasurements." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2020. https://reference.wolfram.com/language/ref/AudioLocalMeasurements.html.
APA
Wolfram Language. (2016). AudioLocalMeasurements. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/AudioLocalMeasurements.html