SequencePredict

SequencePredict[{seq1,seq2,}]

generates a SequencePredictorFunction[] based on the sequences given.

SequencePredict[training,seq]

attempts to predict the next element in the sequence seq from the training sequences given.

SequencePredict[training,{seq1,seq2,}]

gives predictions for each of the sequences seqi.

SequencePredict["name",seq]

uses the built-in sequence predictor represented by "name".

SequencePredict[,seq,prop]

give the specified property of the prediction associated with seq.

Details and Options

  • The sequences seqi can be lists of either tokens or strings.
  • Sequences seqi are assumed to be unordered subsequences of an underlying infinite sequence.
  • In SequencePredict[,seq,prop], properties are as given in SequencePredictorFunction[]; they include:
  • "NextElement"most likely next element
    "NextElement"nindividually most likely next n elements
    "NextSequence"nmost likely next length-n sequence of elements
    "RandomNextElement"random sample from the next-element distribution
    "RandomNextElement"nrandom sample from the next-sequence distribution
    "Probabilities"association of probabilities for all possible next elements
    "SequenceProbability"probability for the predictor to generate the given sequence
    "SequenceLogProbability"log probability for the predictor to generate the sequence
    "Properties"list of all properties available
  • Examples of built-in sequence predictors include:
  • "Chinese"character-based Chinese-language text
    "English"character-based English-language text
    "French"character-based French-language text
    "German"character-based German-language text
    "Portuguese"character-based Portuguese-language text
    "Russian"character-based Russian-language text
    "Spanish"character-based Spanish-language text
  • The following options can be given:
  • FeatureExtractorAutomatichow to preprocess sequences
    MethodAutomaticwhich prediction algorithm to use
    PerformanceGoalAutomaticaspects of performance to try to optimize
    RandomSeeding1234what seeding of pseudorandom generators should be done internally
  • Typical settings for FeatureExtractor for strings include:
  • "SegmentedCharacters"string interpreted as a sequence of characters (default)
    "SegmentedWords"string interpreted as a sequence of words
  • Possible settings for PerformanceGoal include:
  • "Memory"minimize storage requirements of the predictor
    "Quality"maximize accuracy of the predictor
    "Speed"maximize speed of the predictor
    "TrainingSpeed"minimize time spent producing the predictor
    Automaticautomatic tradeoff among speed, accuracy and memory
  • PerformanceGoal{goal1,goal2,} will automatically combine goal1, goal2, etc.
  • Possible settings for RandomSeeding include:
  • Automaticautomatically reseed every time the function is called
    Inheriteduse externally seeded random numbers
    seeduse an explicit integer or strings as a seed
  • Possible settings for Method include:
  • "Markov"Markov model
  • In SequencePredict[,Method{"Markov","Order"order}], order corresponds to Markov process memory size.
  • In SequencePredict[,"SequenceProbability"], some probability mass is kept for unknown elements.
  • In SequencePredict[training,{},prop], {} is interpreted as an empty list of sequences rather than an empty sequence.

Examples

open allclose all

Basic Examples  (1)

Train a sequence predictor on a set of sequences:

Predict the next element of a new sequence:

Obtain the probabilities of the next element given the sequence:

Obtain a random next element according to the preceding distribution:

Obtain multiple predictions at a time:

Predict the most likely next element and reuse this intermediate guess to predict the following element:

Predict the most likely following sequence:

Compare the probabilities for the preceding sequences:

Scope  (4)

Custom Sequence Predictors  (3)

Train a sequence predictor on a list of strings:

Predict the next character following a given string:

Predict the next four characters:

Obtain the probabilities for each character to follow the given string:

Train a sequence predictor on the list of common English words, each word treated as a sequence of characters:

Predict the most likely next character from a given sequence:

In the previous example, each word is considered as a subsequence of an infinite sequence. Use the character | to mark boundaries between words:

Build a new sequence predictor aware of word boundaries:

Generate the beginning of an English-like word:

Load a book from ExampleData:

Train a sequence predictor on this book:

Sample a random string in the book style:

Train another sequence predictor, interpreting strings as word sequences rather than character sequences:

Complete the preceding string with 10 consecutive words (spaces and punctuation marks are considered as words):

Built-in Sequence Predictors  (1)

Download the "English" built-in sequence predictor:

Obtain the log-probability of the given string:

Options  (5)

FeatureExtractor  (2)

Preprocess the training text to predict on words rather than at the character level:

Complete the preceding string with 10 consecutive words (spaces and punctuation marks are considered as words):

Preprocess the training text to lowercase to obtain a better statistic with higher letter counts:

PerformanceGoal  (2)

Train a classifier with an emphasis on the resulting model memory footprint:

Compare with the automatically generated model size:

Tune the computation time and precision when exploring the full sequence probability space:

Favor fast and approximated exploration:

Favor more in-depth exploration taking longer computation time:

Compare the results:

Method  (1)

Specify a memory size of 3 for the Markov process trained on the training subsequences:

Possible Issues  (1)

An empty list is parsed as the list with no sequences inside and will return an empty list:

To obtain the most likely next element completing an empty sequence, nest it in a second list for disambiguation:

Introduced in 2017
 (11.1)
 |
Updated in 2017
 (11.2)