SpeechRecognize
✖
SpeechRecognize

Details and Options



- Speech recognition aims to convert a spoken audio signal to text. It is also known as speech-to-text and is typically used in voice-enabled human-machine interactions and digital personal assistants.
- SpeechRecognize[audio] returns all recognized speech in audio as a single string.
- Structural elements specified in level include:
-
Automatic speech found in the whole audio signal (default) "Segment" a list of transcription segments "Sentence" a list of sentences "Word" a list of words - The property prop can be one of the following:
-
"Audio" trimmed audio containing the recognized text "Confidence" strength of the recognized text "Interval" interval containing the text "SubtitleRules" a list of time intervals and texts "Text" recognized text (default) {prop1,prop2,…} a list of properties - The following options can be given:
-
Language Automatic the language to recognize Masking All interval of interest Method Automatic the method to use PerformanceGoal $PerformanceGoal aspects of performance to try to optimize ProgressReporting $ProgressReporting whether to report the progress of the computation TargetDevice "CPU" the device on which to perform recognition - Use Languagelang1lang2 to recognize speech assumed to be in language lang1 and return translated text in language lang2.
- By default, speech in the whole signal is recognized. Use Masking->{int1,int2,…} to limit the recognition to intervals inti.
- Possible settings for Method are:
-
Automatic automatic method "GoogleSpeech" uses Google speech-to-text "NeuralNetwork" uses built-in neural networks "OpenAI" uses OpenAI speech-to-text - By default, if a method returns non-speech tokens (e.g. [applause]), they are returned in the result. Use Method{method,"NonSpeechReplacement"replacements} to specify different replacements. Use "NonSpeechReplacement""" to remove them.
- SpeechRecognize works for English speech as well as various other languages, such as Chinese, Dutch, French, Japanese and Spanish.
- SpeechRecognize uses machine learning. Its methods, training sets and biases included therein may change and yield varied results in different versions of the Wolfram Language.
- SpeechRecognize may download resources that will be stored in your local object store at $LocalBase, and can be listed using LocalObjects[] and removed using ResourceRemove.

Examples
open allclose allBasic Examples (2)Summary of the most common use cases
Scope (4)Survey of the scope of standard use cases
Basic Uses (2)
Recognize speech in a short audio track:

https://wolfram.com/xid/0fq236pu2wn7y-9p4jbh


https://wolfram.com/xid/0fq236pu2wn7y-pcz9n

Recognize speech in an audio track of a video file:

https://wolfram.com/xid/0fq236pu2wn7y-0emb15

Recognize speech in a non-English language:

https://wolfram.com/xid/0fq236pu2wn7y-nhkjxj

Classify the language from the recognized text:

https://wolfram.com/xid/0fq236pu2wn7y-wiihtq

Classify the language from the original audio:

https://wolfram.com/xid/0fq236pu2wn7y-7wpumt

Level Specification (1)
By default, all recognized text is returned as one string:

https://wolfram.com/xid/0fq236pu2wn7y-qymfux


https://wolfram.com/xid/0fq236pu2wn7y-k1wldd

Extract a list of recognized sentences:

https://wolfram.com/xid/0fq236pu2wn7y-0e94kh


https://wolfram.com/xid/0fq236pu2wn7y-oouem8

Extract a list of segments, typically used for splitting text for subtitles:

https://wolfram.com/xid/0fq236pu2wn7y-xgnwzu

Properties (1)
Options (3)Common values & functionality for each option
Masking (1)
Use the Masking option to recognize parts of a signal:

https://wolfram.com/xid/0fq236pu2wn7y-j6mbm3

https://wolfram.com/xid/0fq236pu2wn7y-phiskt

Method (1)
By default, a local model is used for speech recognition:

https://wolfram.com/xid/0fq236pu2wn7y-8ovmgw

Use OpenAI speech recognition:

https://wolfram.com/xid/0fq236pu2wn7y-mbqjxk

Use GoogleSpeech speech recognition:

https://wolfram.com/xid/0fq236pu2wn7y-bujstd

PerformanceGoal (1)
By default, a medium-speed model with moderate quality is used:

https://wolfram.com/xid/0fq236pu2wn7y-u2b01g

https://wolfram.com/xid/0fq236pu2wn7y-qvqm2h


https://wolfram.com/xid/0fq236pu2wn7y-cusxr6

Get the higher-quality result:

https://wolfram.com/xid/0fq236pu2wn7y-7vc2ob

A balanced speed and quality result:

https://wolfram.com/xid/0fq236pu2wn7y-lzjoya

Applications (4)Sample problems that can be solved with this function
Use AudioIntervals to select which parts of the signal to recognize:

https://wolfram.com/xid/0fq236pu2wn7y-xwcoq6

https://wolfram.com/xid/0fq236pu2wn7y-tgpm1s


https://wolfram.com/xid/0fq236pu2wn7y-guw8p2


https://wolfram.com/xid/0fq236pu2wn7y-qx04uz

Show the recognized city on the map:

https://wolfram.com/xid/0fq236pu2wn7y-vrwtfy

Find the answer from a spoken question in a text:

https://wolfram.com/xid/0fq236pu2wn7y-voti84

https://wolfram.com/xid/0fq236pu2wn7y-6gj9gs


Build an automatic assistant based on Wolfram|Alpha:

https://wolfram.com/xid/0fq236pu2wn7y-vwfjed

Wolfram Research (2019), SpeechRecognize, Wolfram Language function, https://reference.wolfram.com/language/ref/SpeechRecognize.html (updated 2024).
Text
Wolfram Research (2019), SpeechRecognize, Wolfram Language function, https://reference.wolfram.com/language/ref/SpeechRecognize.html (updated 2024).
Wolfram Research (2019), SpeechRecognize, Wolfram Language function, https://reference.wolfram.com/language/ref/SpeechRecognize.html (updated 2024).
CMS
Wolfram Language. 2019. "SpeechRecognize." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2024. https://reference.wolfram.com/language/ref/SpeechRecognize.html.
Wolfram Language. 2019. "SpeechRecognize." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2024. https://reference.wolfram.com/language/ref/SpeechRecognize.html.
APA
Wolfram Language. (2019). SpeechRecognize. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/SpeechRecognize.html
Wolfram Language. (2019). SpeechRecognize. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/SpeechRecognize.html
BibTeX
@misc{reference.wolfram_2025_speechrecognize, author="Wolfram Research", title="{SpeechRecognize}", year="2024", howpublished="\url{https://reference.wolfram.com/language/ref/SpeechRecognize.html}", note=[Accessed: 16-April-2025
]}
BibLaTeX
@online{reference.wolfram_2025_speechrecognize, organization={Wolfram Research}, title={SpeechRecognize}, year={2024}, url={https://reference.wolfram.com/language/ref/SpeechRecognize.html}, note=[Accessed: 16-April-2025
]}