Speech Computation

Topic
Overview  »

Speech computation consists of processing speech signals and analyzing them to infer information. Operations include changing the speaker pitch, detecting voiced intervals and recognizing the speaker or the speech.  The Wolfram Language provides built-in and fully integrated audio processing, statistical analysis, visualization and machine learning, which enables easy-to-prototype and highly efficient speech computations.

Generating & Importing Speech »

SpeechSynthesize synthesize a speech signal from text

AudioCapture capture a speech signal from an input device

Audio  ▪  Import  ▪  WebAudioSearch  ▪  ExampleData  ▪  ResourceData  ▪  ...

Visualization

Spectrogram plot the spectrogram of a speech signal

Cepstrogram  ▪  Periodogram  ▪  AudioPlot

Understanding Speech

SpeechRecognize speech-to-text to convert a spoken audio signal to text

LanguageIdentify  ▪  SpeechCases  ▪  SpeechInterpreter  ▪  PitchRecognize  ▪  SpeakerMatchQ

Speech Analysis

AudioIntervals find voiced or unvoiced intervals

AudioLoudness  ▪  AudioLocalMeasurements  ▪  ShortTimeFourier

Speech Manipulation

AudioPitchShift apply pitch shifting to a speech signal

AudioTimeStretch  ▪  AudioFrequencyShift

Speech Synthesis

SpeechSynthesize produce spoken signal from text

VoiceStyleData  ▪  $VoiceStyles

Machine Learning »

Classify perform classification on a collection of speech signals

FeatureSpacePlot  ▪  FeatureSpacePlot3D  ▪  FeatureExtractor  ▪  Nearest  ▪  ...

Neural Networks »

NetModel use pre-trained nets for speech analysis

NetEncoder  ▪  "Audio"  ▪  "AudioMFCC"  ▪  "AudioMelSpectrogram"  ▪  ...

NetTrain  ▪  GatedRecurrentLayer  ▪  LongShortTermMemoryLayer  ▪  CTCLossLayer  ▪  ...

Labeling & Annotations

AudioAnnotate annotate an audio object with result of analysis

AnnotationKeys  ▪  AnnotationValue  ▪  AnnotationDelete

Audio Manipulation »

AudioTrim extract an interesting part of a speech signal

AudioJoin  ▪  AudioReplace  ▪  LowpassFilter  ▪  WienerFilter  ▪  ...