SpeakerMatchQ

SpeakerMatchQ[audio,ref]

gives True if speaker features in audio match the one from reference ref and returns False otherwise.

SpeakerMatchQ[{audio1,audio2,},ref]

gives a list of results for each of audioi.

SpeakerMatchQ[ref]

represents an operator form of SpeakerMatchQ that can be applied to an audio object.

Details and Options

  • SpeakerMatchQ computes speaker features for audio and reference ref and returns True if the distance between speaker features is acceptable.
  • The reference ref could be any of the following:
  • refa single-reference Audio object
    ref1|ref2|several possible references, tried in order
  • The following options can be given:
  • AcceptanceThreshold0.5minimum probability to consider acceptable
    MaskingAllinterval of interest
    RecognitionPrior0.5prior probability for a True result
    TargetDevice"CPU"the target device on which to compute
  • Use the Masking option to specify the interval of interest in any of the audioi. Possible settings include:
  • Alluses the whole audio
    {t1,t2}uses the interval t1 to t2
    {{t11,t12},{t21,t22},}uses the interval ti1 to ti2 from audioi
  • SpeakerMatchQ uses machine learning. Its methods, training sets and biases included therein may change and yield varied results in different versions of the Wolfram Language.
  • SpeakerMatchQ may download resources that will be stored in your local object store at $LocalBase, and that can be listed using LocalObjects[] and removed using ResourceRemove.

Examples

open allclose all

Basic Examples  (2)

Check whether two recordings belong to the same speaker:

Compare the speaker in a recording and a time-stretched version of it:

Scope  (3)

Test whether the speaker in a recording matches any of several references:

Test whether any of the speakers from a list of recordings matches a reference:

Use SpeakerMatchQ in operator form:

Options  (4)

AcceptanceThreshold  (1)

By default, 0.5 is used as the acceptance threshold:

Specify the minimum probability to consider acceptable:

Masking  (2)

By default, the whole audio recording is compared, which may fail if it contains multiple speakers:

Specify an interval of interest within the recording to compare against the reference:

Apply separate masking to each input audio in a list of recordings:

RecognitionPrior  (1)

Specify the prior probability that the speaker in a recording matches a reference:

Use a higher prior probability:

Applications  (3)

Compare the speaker in a recording and a time-stretched version of it:

Compare the speaker in a recording and a pitch-shifted version of it:

In the Spoken Digit Command dataset, construct a speaker-match matrix for a subset of recordings:

Select 10 random speakers for which the dataset has between 2 and 5 samples:

Extract all recordings corresponding to these speakers and sort them by speaker ID:

Compute and plot the matrix of matching speakers:

Properties & Relations  (1)

SpeakerMatchQ computes speaker features on its input recordings and compares these embeddings.

From the Spoken Digit Command dataset, extract recordings from speakers who only have between 2 and 5 recordings:

Compute speaker features on each recording:

Visualize a sample of a computed features:

Compare the speaker features and plot a distance matrix on them:

Compute a binary distance matrix showing whether the speaker features match:

Compare with the result of SpeakerMatchQ; the difference is because no voice is detected in some of the recordings:

Possible Issues  (1)

SpeakerMatchQ finds voiced intervals first and fails if no voice is detected in any one of the inputs:

Introduced in 2020
 (12.1)