Details & Suboptions

In a Markov model, at training time, an n-gram language model is computed for each class. At test time, the probability for each class is computed according to Bayes's theorem, , where is given by the language model of the given class and is class prior.
The following options can be given:

"AdditiveSmoothing"	.1	the smoothing parameter to use
"MinimumTokenCount"	Automatic	minimum count for an n-gram to to be considered
"Order"	Automatic	n-gram length

When "Order"n, the method partitions sequences in (n+1)-grams.
When "Order"0, the method uses unigrams (single tokens). The model can then be called a unigram model or naive Bayes model.
The value of "AdditiveSmoothing" is added to all n-gram counts. It is used to regularize the language model.

Examples

open all close all

Train a classifier function on labeled examples:

Obtain information about the classifier:

Classify a new example:

Train a classifier using the "AdditiveSmoothing" suboption:

Train two classifiers on an imbalanced dataset by varying the value of "AdditiveSmoothing":

Look at the corresponding probabilities for the imbalanced element:

Train a classifier by specifying the "Order":

Generate a dataset of real words and random strings:

Generate classifiers using different values for the "Order":

Compare the probabilities of these classifiers on a new real word:

Top