SequencePredict

SequencePredict[{seq₁,seq₂,…}]

generates a SequencePredictorFunction[…] based on the sequences given.

SequencePredict[training,seq]

attempts to predict the next element in the sequence seq from the training sequences given.

SequencePredict[training,{seq₁,seq₂,…}]

gives predictions for each of the sequences seq_i.

SequencePredict["name",seq]

uses the built-in sequence predictor represented by "name".

SequencePredict[…,seq,prop]

give the specified property of the prediction associated with seq.

Details and Options

The sequences seq_i can be lists of either tokens or strings.
Sequences seq_i are assumed to be unordered subsequences of an underlying infinite sequence.
In SequencePredict[…,seq,prop], properties are as given in SequencePredictorFunction[…]; they include:

	"NextElement"	most likely next element
	"NextElement"n	individually most likely next n elements
	"NextSequence"n	most likely next length-n sequence of elements
	"RandomNextElement"	random sample from the next-element distribution
	"RandomNextElement"n	random sample from the next-sequence distribution
	"Probabilities"	association of probabilities for all possible next elements
	"SequenceProbability"	probability for the predictor to generate the given sequence
	"SequenceLogProbability"	log probability for the predictor to generate the sequence
	"Properties"	list of all properties available

Examples of built-in sequence predictors include:

	"Chinese"	character-based Chinese-language text
	"English"	character-based English-language text
	"French"	character-based French-language text
	"German"	character-based German-language text
	"Portuguese"	character-based Portuguese-language text
	"Russian"	character-based Russian-language text
	"Spanish"	character-based Spanish-language text

The following options can be given:

FeatureExtractor	Automatic	how to preprocess sequences
Method	Automatic	which prediction algorithm to use
PerformanceGoal	Automatic	aspects of performance to try to optimize
RandomSeeding	1234	what seeding of pseudorandom generators should be done internally

Typical settings for FeatureExtractor for strings include:
"SegmentedCharacters" string interpreted as a sequence of characters (default)

"SegmentedWords" string interpreted as a sequence of words
Possible settings for PerformanceGoal include:

	"Memory"	minimize storage requirements of the predictor
	"Quality"	maximize accuracy of the predictor
	"Speed"	maximize speed of the predictor
	"TrainingSpeed"	minimize time spent producing the predictor
	Automatic	automatic tradeoff among speed, accuracy and memory

PerformanceGoal{goal₁,goal₂,…} will automatically combine goal₁, goal₂, etc.
Possible settings for RandomSeeding include:

	Automatic	automatically reseed every time the function is called
	Inherited	use externally seeded random numbers
	seed	use an explicit integer or strings as a seed

Possible settings for Method include:
"Markov" Markov model
In SequencePredict[…,Method{"Markov","Order"order}], order corresponds to Markov process memory size.
In SequencePredict[…,"SequenceProbability"], some probability mass is kept for unknown elements.
In SequencePredict[training,{},prop], {} is interpreted as an empty list of sequences rather than an empty sequence.

Examples

open allclose all

Basic Examples (1)

Train a sequence predictor on a set of sequences:

Predict the next element of a new sequence:

Obtain the probabilities of the next element given the sequence:

Obtain a random next element according to the preceding distribution:

Obtain multiple predictions at a time:

Predict the most likely next element and reuse this intermediate guess to predict the following element:

Predict the most likely following sequence:

Compare the probabilities for the preceding sequences:

Scope (4)

Custom Sequence Predictors (3)

Train a sequence predictor on a list of strings:

Predict the next character following a given string:

Predict the next four characters:

Obtain the probabilities for each character to follow the given string:

Train a sequence predictor on the list of common English words, each word treated as a sequence of characters:

Predict the most likely next character from a given sequence:

In the previous example, each word is considered as a subsequence of an infinite sequence. Use the character | to mark boundaries between words:

Build a new sequence predictor aware of word boundaries:

Generate the beginning of an English-like word:

Load a book from ExampleData:

Train a sequence predictor on this book:

Sample a random string in the book style:

Train another sequence predictor, interpreting strings as word sequences rather than character sequences:

Complete the preceding string with 10 consecutive words (spaces and punctuation marks are considered as words):

Built-in Sequence Predictors (1)

Download the "English" built-in sequence predictor:

Obtain the log-probability of the given string:

Options (5)

FeatureExtractor (2)

Preprocess the training text to predict on words rather than at the character level:

Complete the preceding string with 10 consecutive words (spaces and punctuation marks are considered as words):

Preprocess the training text to lowercase to obtain a better statistic with higher letter counts:

PerformanceGoal (2)

Train a classifier with an emphasis on the resulting model memory footprint:

Compare with the automatically generated model size:

Tune the computation time and precision when exploring the full sequence probability space:

Favor fast and approximated exploration:

Favor more in-depth exploration taking longer computation time:

Compare the results:

Method (1)

Specify a memory size of 3 for the Markov process trained on the training subsequences:

Possible Issues (1)

An empty list is parsed as the list with no sequences inside and will return an empty list:

To obtain the most likely next element completing an empty sequence, nest it in a second list for disambiguation:

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

SequencePredict

Details and Options

Examples

Basic Examples (1)

Scope (4)

Custom Sequence Predictors (3)

Built-in Sequence Predictors (1)

Options (5)

FeatureExtractor (2)

PerformanceGoal (2)

Method (1)

Possible Issues (1)

Text

CMS

APA

BibTeX

BibLaTeX

	"SegmentedCharacters"	string interpreted as a sequence of characters (default)
	"SegmentedWords"	string interpreted as a sequence of words

SequencePredict

Details and Options

Examples

Basic Examples (1)

Scope (4)

Custom Sequence Predictors (3)

Built-in Sequence Predictors (1)

Options (5)

FeatureExtractor (2)

PerformanceGoal (2)

Method (1)

Possible Issues (1)

See Also

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX