SpeakerMatchQ
SpeakerMatchQ[audio,ref]
gives True if speaker features in audio match the one from reference ref and returns False otherwise.
SpeakerMatchQ[{audio1,audio2,…},ref]
gives a list of results for each of audioi.
SpeakerMatchQ[ref]
represents an operator form of SpeakerMatchQ that can be applied to an audio object.
Details and Options
- SpeakerMatchQ computes speaker features for audio and reference ref and returns True if the distance between speaker features is acceptable.
- The reference ref could be any of the following:
-
ref a single-reference Audio object ref1ref2… several possible references, tried in order - The following options can be given:
-
AcceptanceThreshold 0.5 minimum probability to consider acceptable Masking All interval of interest RecognitionPrior 0.5 prior probability for a True result TargetDevice "CPU" the target device on which to compute - Use the Masking option to specify the interval of interest in any of the audioi. Possible settings include:
-
All uses the whole audio {t1,t2} uses the interval t1 to t2 {{t11,t12},{t21,t22},…} uses the interval ti1 to ti2 from audioi - SpeakerMatchQ uses machine learning. Its methods, training sets and biases included therein may change and yield varied results in different versions of the Wolfram Language.
- SpeakerMatchQ may download resources that will be stored in your local object store at $LocalBase, and that can be listed using LocalObjects[] and removed using ResourceRemove.
Examples
open allclose allBasic Examples (2)
Scope (3)
Test whether the speaker in a recording matches any of several references:
Test whether any of the speakers from a list of recordings matches a reference:
Use SpeakerMatchQ in operator form:
Options (4)
AcceptanceThreshold (1)
Masking (2)
Applications (3)
Compare the speaker in a recording and a time-stretched version of it:
Compare the speaker in a recording and a pitch-shifted version of it:
In the Spoken Digit Command dataset, construct a speaker-match matrix for a subset of recordings:
Select 10 random speakers for which the dataset has between 2 and 5 samples:
Extract all recordings corresponding to these speakers and sort them by speaker ID:
Properties & Relations (1)
SpeakerMatchQ computes speaker features on its input recordings and compares these embeddings.
From the Spoken Digit Command dataset, extract recordings from speakers who only have between 2 and 5 recordings:
Compute speaker features on each recording:
Visualize a sample of a computed features:
Compare the speaker features and plot a distance matrix on them:
Compute a binary distance matrix showing whether the speaker features match:
Compare with the result of SpeakerMatchQ; the difference is because no voice is detected in some of the recordings:
Possible Issues (1)
SpeakerMatchQ finds voiced intervals first and fails if no voice is detected in any one of the inputs:
Text
Wolfram Research (2020), SpeakerMatchQ, Wolfram Language function, https://reference.wolfram.com/language/ref/SpeakerMatchQ.html.
CMS
Wolfram Language. 2020. "SpeakerMatchQ." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/SpeakerMatchQ.html.
APA
Wolfram Language. (2020). SpeakerMatchQ. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/SpeakerMatchQ.html