"GoogleSpeech" (Service Connection)
Use Google Text-to-Speech and Speech-to-Text APIs with the Wolfram Language.
Connecting & Authenticating
Synthesize Audio from Text
"ListVoices" — returns a list of available voice styles
Language | All | restrict the query to voices able to synthesize a given language |
"Synthesize" — returns speech synthesized from text
"Input" | (required) | text to synthesize | |
"Voice" | Automatic | name of the synthesis voice | |
Language | Automatic | language of the synthesis voice | |
"Pitch" | Automatic | semitone deviation from the native voice pitch | |
"Rate" | Automatic | factor by which to change the native voice speed | |
AudioEncoding | Automatic | output audio encoding | |
GeneratedAssetLocation | $GeneratedAssetLocation | storage location of the synthesized audio | |
GeneratedAssetFormat | Automatic | output format of the synthesized audio | |
"EffectsProfileID" | Automatic | post-processing effect name applied to speech |
Recognize Text from Audio
"Recognize" — returns text transcribed from audio
"Input" | (required) | audio to transcribe | |
Language | "English" | language(s) of the contained speech | |
"ChannelRecognition" | False | whether to transcribe each channel separately | |
MaxItems | 1 | maximum number of hypotheses to return | |
"ProfanityFilter" | False | whether to attempt to replace profanities | |
"SpeechContexts" | {} | phrase hints to assist transcription | |
"WordTimeOffsets" | True | return word time offsets with the result | |
"WordConfidence" | False | return word confidence values with the result | |
"Punctuation" | True | include punctuation in the transcription | |
"SpokenPunctuation" | False | replace spoken punctuation with ASCII character | |
"SpokenEmojis" | False | replace spoken emojis with Unicode character | |
"SpeakerDiarization" | False | tag distinct speakers in the result | |
"Model" | Automatic | specify a model to use for the request | |
MetaInformation | None | metadata describing the input audio |
Parameter Details
strw | give weight w to the string str | |
{str1w1,str2w2,…} | give weight wi to the string stri |
"large-automotive-class-device" | optimized for car speakers | |
"small-bluetooth-speaker-class-device" | optimized for small home speakers |
"latest_long" | optimized for long-form content | |
"latest_short" | optimized for short-form content | |
"command_and_search" | optimized for short queries |
open allclose allBasic Examples (1)
Scope (2)
Speech Synthesis (1)
Synthesize text in a different language. Setting "Language" to Automatic will infer the language from the input text, or a particular language can be specified. The service will attempt to select a voice style with the requested language:
Speech Recognition (1)
Transcribe text from audio containing speech:
By default, everything from the API response is returned, including information about recognized words:
Return multiple guesses of the transcription:
Separate different speakers from a recording:
Specify the minimum and maximum number of speakers:
Display labeled words in a Dataset. The API currently returns speaker labels in the second result: