---
title: "SpeakerMatchQ"
language: "en"
type: "Symbol"
summary: "SpeakerMatchQ[audio, ref] gives True if speaker features in audio match the one from reference ref and returns False otherwise. SpeakerMatchQ[{audio1, audio2, ...}, ref] gives a list of results for each of audioi. SpeakerMatchQ[ref] represents an operator form of SpeakerMatchQ that can be applied to an audio object."
keywords: 
- speaker analysis
- same speaker
- test for speaker match
- test for same speaker
- speaker characteristic
- speaker matching
- speaker identification
- speaker detection
- nearest speaker
canonical_url: "https://reference.wolfram.com/language/ref/SpeakerMatchQ.html"
source: "Wolfram Language Documentation"
related_guides: 
  - 
    title: "Machine Learning"
    link: "https://reference.wolfram.com/language/guide/MachineLearning.en.md"
  - 
    title: "Speech Computation"
    link: "https://reference.wolfram.com/language/guide/SpeechComputation.en.md"
related_functions: 
  - 
    title: "FeatureExtractor"
    link: "https://reference.wolfram.com/language/ref/FeatureExtractor.en.md"
  - 
    title: "AudioDistance"
    link: "https://reference.wolfram.com/language/ref/AudioDistance.en.md"
  - 
    title: "Classify"
    link: "https://reference.wolfram.com/language/ref/Classify.en.md"
  - 
    title: "SpeechRecognize"
    link: "https://reference.wolfram.com/language/ref/SpeechRecognize.en.md"
  - 
    title: "SpeechCases"
    link: "https://reference.wolfram.com/language/ref/SpeechCases.en.md"
  - 
    title: "SpeechInterpreter"
    link: "https://reference.wolfram.com/language/ref/SpeechInterpreter.en.md"
  - 
    title: "AudioInstanceQ"
    link: "https://reference.wolfram.com/language/ref/AudioInstanceQ.en.md"
---
[EXPERIMENTAL]

# SpeakerMatchQ

SpeakerMatchQ[audio, ref] gives True if speaker features in audio match the one from reference ref and returns False otherwise.

SpeakerMatchQ[{audio1, audio2, …}, ref] gives a list of results for each of audioi.

SpeakerMatchQ[ref] represents an operator form of SpeakerMatchQ that can be applied to an audio object.

## Details and Options

* ``SpeakerMatchQ`` computes speaker features for ``audio`` and reference ``ref`` and returns ``True`` if the distance between speaker features is acceptable.

* The reference ``ref`` could be any of the following:

|                   |                                             |
| ----------------- | ------------------------------------------- |
| ref               | a single-reference Audio object             |
| ref1 \| ref2 \| … | several possible references, tried in order |

* The following options can be given:

|                      |       |                                            |
| -------------------- | ----- | ------------------------------------------ |
| AcceptanceThreshold  | 0.5   | minimum probability to consider acceptable |
| Masking              | All   | interval of interest                       |
| RecognitionPrior     | 0.5   | prior probability for a True result        |
| TargetDevice         | "CPU" | the target device on which to compute      |

* Use the ``Masking`` option to specify the interval of interest in any of the ``audioi``. Possible settings include:

|                             |                                          |
| --------------------------- | ---------------------------------------- |
| All                         | uses the whole audio                     |
| {t1, t2}                    | uses the interval t1 to t2               |
| {{t11, t12}, {t21, t22}, …} | uses the interval ti1 to ti2 from audioi |

* ``SpeakerMatchQ`` uses machine learning. Its methods, training sets and biases included therein may change and yield varied results in different versions of the Wolfram Language.

* ``SpeakerMatchQ`` may download resources that will be stored in your local object store at ``\$LocalBase``, and that can be listed using ``LocalObjects[]`` and removed using ``ResourceRemove``.

---

## Examples (14)

### Basic Examples (2)

Check whether two recordings belong to the same speaker:

```wl
In[1]:= SpeakerMatchQ[ExampleData[{"Audio", "FemaleVoice"}], ExampleData[{"Audio", "MaleVoice"}]]

Out[1]= False
```

---

Compare the speaker in a recording and a time-stretched version of it:

```wl
In[1]:= SpeakerMatchQ[AudioTimeStretch[ExampleData[{"Audio", "FemaleVoice"}], 1.5], ExampleData[{"Audio", "FemaleVoice"}]]

Out[1]= True
```

### Scope (3)

Test whether the speaker in a recording matches any of several references:

```wl
In[1]:= SpeakerMatchQ[\!\(\*AudioBox["![Embedded Audio Player](audio://content-pf50d)"]\), ExampleData[{"Audio", "MaleVoice"}] | ExampleData[{"Audio", "FemaleVoice"}]]

Out[1]= True
```

---

Test whether any of the speakers from a list of recordings matches a reference:

```wl
In[1]:= list = {ExampleData[{"Audio", "MaleVoice"}], ExampleData[{"Audio", "FemaleVoice"}]};

In[2]:= SpeakerMatchQ[list, ExampleData[{"Audio", "MaleVoice"}]]

Out[2]= {True, False}
```

---

Use ``SpeakerMatchQ`` in operator form:

```wl
In[1]:= list = {ExampleData[{"Audio", "MaleVoice"}], ExampleData[{"Audio", "FemaleVoice"}]};

In[2]:= GroupBy[list, SpeakerMatchQ[ExampleData[{"Audio", "MaleVoice"}]]]

Out[2]= <|True -> {\!\(\*AudioBox[...]\)}, False -> {\!\(\*AudioBox[...]\)}|>
```

### Options (4)

#### AcceptanceThreshold (1)

By default, 0.5 is used as the acceptance threshold:

```wl
In[1]:= a = ExampleData[{"Audio", "FemaleVoice"}];

In[2]:= SpeakerMatchQ[a, AudioPitchShift[a, .2]]

Out[2]= False
```

Specify the minimum probability to consider acceptable:

```wl
In[3]:= SpeakerMatchQ[a, AudioPitchShift[a, .2], AcceptanceThreshold -> .1]

Out[3]= True
```

#### Masking (2)

By default, the whole audio recording is compared, which may fail if it contains multiple speakers:

```wl
In[1]:= a = AudioJoin[{ExampleData[{"Audio", "FemaleVoice"}], ExampleData[{"Audio", "MaleVoice"}]}];

In[2]:= SpeakerMatchQ[a, ExampleData[{"Audio", "MaleVoice"}]]

Out[2]= False
```

Specify an interval of interest within the recording to compare against the reference:

```wl
In[3]:= SpeakerMatchQ[a, ExampleData[{"Audio", "MaleVoice"}], Masking -> {Quantity[4.3, "Seconds"], Quantity[6.7, "Seconds"]}]

Out[3]= True
```

---

Apply separate masking to each input audio in a list of recordings:

```wl
In[1]:=
a = ExampleData[{"Audio", "FemaleVoice"}];
b = ExampleData[{"Audio", "MaleVoice"}];
list = AudioJoin /@ {{a, b}, {b, a}};

In[2]:= SpeakerMatchQ[list, ExampleData[{"Audio", "MaleVoice"}]]

Out[2]= {False, False}

In[3]:= SpeakerMatchQ[list, ExampleData[{"Audio", "MaleVoice"}], Masking -> {{Quantity[0, "Seconds"], Quantity[2.4, "Seconds"]}, {Quantity[4.3, "Seconds"], Quantity[6.7, "Seconds"]}}]

Out[3]= {True, True}
```

#### RecognitionPrior (1)

Specify the prior probability that the speaker in a recording matches a reference:

```wl
In[1]:= SpeakerMatchQ[ExampleData[{"Audio", "FemaleVoice"}], ExampleData[{"Audio", "MaleVoice"}], RecognitionPrior -> .5]

Out[1]= False
```

Use a higher prior probability:

```wl
In[2]:= SpeakerMatchQ[AudioAmplify[ExampleData[{"Audio", "MaleVoice"}], .9], ExampleData[{"Audio", "MaleVoice"}], RecognitionPrior -> .8]

Out[2]= True
```

### Applications (3)

Compare the speaker in a recording and a time-stretched version of it:

```wl
In[1]:= a = ExampleData[{"Audio", "FemaleVoice"}];

In[2]:= list = Table[AudioTimeStretch[a, s], {s, 1, 5, .5}];

In[3]:= SpeakerMatchQ[list, a]

Out[3]= {True, True, True, True, True, True, True, True, True}
```

---

Compare the speaker in a recording and a pitch-shifted version of it:

```wl
In[1]:= a = ExampleData[{"Audio", "MaleVoice"}];

In[2]:=
list = Table[AudioPitchShift[a, s, Method -> "Speech"], {s, 1, 2, .2}];
SpeakerMatchQ[list, a]

Out[2]= {True, True, True, True, False, False}
```

---

In the [Spoken Digit Command](https://datarepository.wolframcloud.com/resources/Spoken-Digit-Commands-Dataset) dataset, construct a speaker-match matrix for a subset of recordings:

```wl
In[1]:=
testdata = ResourceData["Spoken Digit Commands", "TestData"];
Length[testdata]

Out[1]= 1000

In[2]:= RandomSample[testdata, 3]//Dataset

Out[2]=
Dataset[{Association["Input" -> Audio[CompressedData["«31866»"], "SignedInteger16", Appearance -> "Minimal", AudioOutputDevice -> Automatic, 
     SampleRate -> 16000, SoundVolume -> 1], "SpeakerID" -> "e652590d", "Output" -> 2], 
  Association["In ... ", "Output" -> 9], 
  Association["Input" -> Audio[CompressedData["«30280»"], "SignedInteger16", Appearance -> "Minimal", 
     AudioOutputDevice -> Automatic, SampleRate -> 16000, SoundVolume -> 1], 
   "SpeakerID" -> "a5b24175", "Output" -> 0]}]
```

Select 10 random speakers for which the dataset has between 2 and 5 samples:

```wl
In[3]:= speakers = Keys@RandomSample[Select[Counts[testdata[[All, "SpeakerID"]]], 2 ≤ # ≤ 5&], 10];
```

Extract all recordings corresponding to these speakers and sort them by speaker ID:

```wl
In[4]:=
testsubset = RandomSample[Select[testdata, MemberQ[speakers, #SpeakerID]&]];
testsubset = SortBy[testsubset, #SpeakerID&];
```

Compute and plot the matrix of matching speakers:

```wl
In[5]:= DistanceMatrix[testsubset[[All, "Input"]], DistanceFunction -> (Boole[SpeakerMatchQ[##]]&)]//MatrixPlot

Out[5]= [image]
```

### Properties & Relations (1)

``SpeakerMatchQ`` computes speaker features on its input recordings and compares these embeddings.

From the [Spoken Digit Command](https://datarepository.wolframcloud.com/resources/Spoken-Digit-Commands-Dataset) dataset, extract recordings from speakers who only have between 2 and 5 recordings:

```wl
In[1]:=
testdata = ResourceData["Spoken Digit Commands", "TestData"];
speakers = Keys@RandomSample[Select[Counts[testdata[[All, "SpeakerID"]]], 2 ≤ # ≤ 5&], 10];
testsubset = RandomSample[Select[testdata, MemberQ[speakers, #SpeakerID]&]];
testsubset = SortBy[testsubset, #SpeakerID&];
```

Compute speaker features on each recording:

```wl
In[2]:= features = FeatureExtract[testsubset[[All, "Input"]], "SpeakerFeatureVector"];
```

Visualize a sample of a computed features:

```wl
In[3]:= RandomChoice[features]//ListPlot[#, Filling -> Axis]&

Out[3]= [image]
```

Compare the speaker features and plot a distance matrix on them:

```wl
In[4]:=
distances = DistanceMatrix[features, DistanceFunction -> CosineDistance];
MatrixPlot[distances]

Out[4]= [image]
```

Compute a binary distance matrix showing whether the speaker features match:

```wl
In[5]:= DistanceMatrix[features, DistanceFunction -> (Boole[CosineDistance[##] ≤ .4]&)]//MatrixPlot

Out[5]= [image]
```

Compare with the result of ``SpeakerMatchQ``; the difference is because no voice is detected in some of the recordings:

```wl
In[6]:= Quiet[DistanceMatrix[testsubset[[All, "Input"]], DistanceFunction -> (Boole[SpeakerMatchQ[##]]&)]]//MatrixPlot

Out[6]= [image]
```

### Possible Issues (1)

``SpeakerMatchQ`` finds voiced intervals first and fails if no voice is detected in any one of the inputs:

```wl
In[1]:= list = {ExampleData[{"Audio", "FemaleVoice"}, "Audio"], ExampleData[{"Audio", "IRStairway"}, "Audio"], ExampleData[{"Audio", "NoisyTalk"}, "Audio"]};

In[2]:= SpeakerMatchQ[list, ExampleData[{"Audio", "MaleVoice"}]]
```

SpeakerMatchQ::novoice: No voice detected in the signal at position 2.

```wl
Out[2]= SpeakerMatchQ[{\!\(\*AudioBox["![Embedded Audio Player](audio://content-txc1r)"]\), \!\(\*AudioBox["![Embedded Audio Player](audio://content-4iort)"]\), \!\(\*AudioBox["![Embedded Audio Player](audio://content-151lx)"]\)}, \!\(\*AudioBox[...]\)]
```

## See Also

* [`FeatureExtractor`](https://reference.wolfram.com/language/ref/FeatureExtractor.en.md)
* [`AudioDistance`](https://reference.wolfram.com/language/ref/AudioDistance.en.md)
* [`Classify`](https://reference.wolfram.com/language/ref/Classify.en.md)
* [`SpeechRecognize`](https://reference.wolfram.com/language/ref/SpeechRecognize.en.md)
* [`SpeechCases`](https://reference.wolfram.com/language/ref/SpeechCases.en.md)
* [`SpeechInterpreter`](https://reference.wolfram.com/language/ref/SpeechInterpreter.en.md)
* [`AudioInstanceQ`](https://reference.wolfram.com/language/ref/AudioInstanceQ.en.md)

## Related Guides

* [Machine Learning](https://reference.wolfram.com/language/guide/MachineLearning.en.md)
* [Speech Computation](https://reference.wolfram.com/language/guide/SpeechComputation.en.md)

## History

* [Introduced in 2020 (12.1)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn121.en.md)