SpeechCases

SpeechCases[audio,form]

gives a list of cases of text identified as being of type form that appear in the transcription of audio.

SpeechCases[audio,{form1,form2,}]

gives an association of results for all the types formi.

SpeechCases[audio,formspecprop]

gives the specified property for each result found.

SpeechCases[audio,formspec{prop1,prop2,}]

gives a list of properties for each result found.

SpeechCases[audio,spec,n]

gives the first n cases found.

Details and Options

  • SpeechCases[{audio1,audio2,},] gives cases for each audioi.
  • Identification type form can be:
  • "type"any text content type (e.g. "Noun", "City")
    Entity[,]a specific entity of a text content type
    form1|form2|
  • form matching any of the formi
  • Containing[outer,inner]forms of type outer containing type inner
    Verbatim["string"]a specific string to be matched exactly
    patterna string pattern to be matched
  • Possible choices for the property prop are:
  • "String"string of the identified text (default)
    "Position"start and end position of the string in text
    "Probability"estimated probability that the identification is correct
    "Interpretation"standard interpretation of the identified string
    "Snippet"a snippet around the identified string
    "HighlightedSnippet"a snippet with the identified string highlighted
    fapply f to the association containing all properties
    {prop1,prop2,}a list of property specifications
  • The following options can be given:
  • AcceptanceThresholdAutomaticminimum probability to accept identification
    MaskingAllinterval of interest
    PerformanceGoalAutomaticfavor algorithms with specific advantages
    TargetDevice"CPU"whether CPU or GPU computation should be used for entity detection
    VerifyInterpretationFalsewhether interpretability should be verified
  • SpeechCases uses machine learning. Its methods, training sets and biases included therein may change and yield varied results in different versions of the Wolfram Language.
  • SpeechCases may download resources that will be stored in your local object store at $LocalBase, and that can be listed using LocalObjects[] and removed using ResourceRemove.

Examples

open allclose all

Basic Examples  (2)

Find the cities in a speech recording:

Find the cities and get interpretations:

Scope  (13)

Basic Uses  (3)

Find all cities:

Find and interpret all instances of cities:

Specify the maximum number of identifications to return for each type:

Form Specification  (4)

Find the spoken nouns:

Find the spoken words:

Find cities and countries in a recording:

Use Alternatives to match multiple types:

Find all sentences in a string that contain currency amounts:

Properties  (6)

Find currency amounts and get interpretations:

Obtain probabilities and interpretations for detected cities and countries:

Specify multiple return types:

Show all available properties in an Association:

Create a dataset with the properties of several types of entities:

Get the geodetic positions of the locations occurring in a spoken text:

Options  (3)

AcceptanceThreshold  (1)

By default, an automatic acceptance threshold is used:

Specify the minimum identification probability:

Masking  (1)

By default, the whole audio signal is processed for interpretable results:

Search for nouns only in the first half of the signal:

VerifyInterpretation  (1)

By default, the interpretability of a result is not verified and a string is returned instead of an interpretation:

Filter out the entities that cannot be interpreted:

Properties & Relations  (2)

SpeechCases is effectively calling TextCases on the result of SpeechRecognize:

SpeechCases supports the same identification types as TextCases:

They identify the same substrings for a given type and transcription:

Possible Issues  (1)

The ability of SpeechCases to return an interpretable result is limited by the quality of the speech transcription:

See the transcription generated by SpeechRecognize:

The mistranscription of "Bergen" cannot be interpreted as a city in Norway:

Introduced in 2020
 (12.1)