Speech Computation
TopicOverview »
Speech computation consists of processing speech signals and analyzing them to infer information. Operations include changing the speaker pitch, detecting voiced intervals and recognizing the speaker or the speech. The Wolfram Language provides built-in and fully integrated audio processing, statistical analysis, visualization and machine learning, which enables easy-to-prototype and highly efficient speech computations.
Generating & Importing Speech »
SpeechSynthesize — synthesize a speech signal from text
AudioCapture — capture a speech signal from an input device
Audio ▪ Import ▪ WebAudioSearch ▪ ExampleData ▪ ResourceData ▪ ...
Visualization
Spectrogram — plot the spectrogram of a speech signal
Cepstrogram ▪ Periodogram ▪ AudioPlot
Understanding Speech
SpeechRecognize — speech-to-text to convert a spoken audio signal to text
LanguageIdentify ▪ SpeechCases ▪ SpeechInterpreter ▪ PitchRecognize ▪ SpeakerMatchQ
Speech Analysis
AudioIntervals — find voiced or unvoiced intervals
AudioLoudness ▪ AudioLocalMeasurements ▪ ShortTimeFourier
Speech Manipulation
AudioPitchShift — apply pitch shifting to a speech signal
AudioTimeStretch ▪ AudioFrequencyShift
Speech Synthesis
SpeechSynthesize — produce spoken signal from text
Machine Learning »
Classify — perform classification on a collection of speech signals
FeatureSpacePlot ▪ FeatureSpacePlot3D ▪ FeatureExtractor ▪ Nearest ▪ ...
Neural Networks »
NetModel — use pre-trained nets for speech analysis
NetEncoder ▪ "Audio" ▪ "AudioMFCC" ▪ "AudioMelSpectrogram" ▪ ...
NetTrain ▪ GatedRecurrentLayer ▪ LongShortTermMemoryLayer ▪ CTCLossLayer ▪ ...
Labeling & Annotations
AudioAnnotate — annotate an audio object with result of analysis
AnnotationKeys ▪ AnnotationValue ▪ AnnotationDelete
Audio Manipulation »
AudioTrim — extract an interesting part of a speech signal
AudioJoin ▪ AudioReplace ▪ LowpassFilter ▪ WienerFilter ▪ ...