Wolfram Language & System Documentation Center

"RandomForest" (Machine Learning Method)

Method for Classify and Predict.
Predict the value or class of an example using an ensemble of decision trees.

Details & Suboptions

Random forest is an ensemble learning method for classification and regression that operates by constructing a multitude of decision trees. The forest prediction is obtained by taking the most common class or the mean-value tree predictions. Each decision tree is trained on a random subset of the training set and only uses a random subset of the features (bootstrap aggregating algorithm).
The following options can be given:

"DistributionSmoothing"	0.5	regularization parameter
"FeatureFraction"	Automatic	the fraction of features to be randomly selected to train each tree
"LeafSize"	Automatic	the maximum number of examples in each leaf
"TreeNumber"	Automatic	the number of trees in the forest

"FeatureFraction", "LeafSize" and "DistributionSmoothing" can be used to control overfitting.

Examples

open all close all

Basic Examples (3)

Train a predictor on labeled examples:

Wolfram Language code: p = Predict[{1, 2, 3, 4} -> {.3, .4, .6, 9}, Method -> "RandomForest"]

Obtain information about the predictor:

Wolfram Language code: Information[p]

Predict a new example:

Wolfram Language code: p[1.3]

Train a classifier function on labeled examples:

Wolfram Language code: trainingset = {1 -> "A", 2 -> "A", 3.5 -> "B", 4 -> "B", 5 -> "A", 5.5 -> "A"};

Wolfram Language code: c = Classify[trainingset, Method -> "RandomForest"]

Plot the probability that the class of an example is "A" or "B" as a function of the feature and compare them:

Wolfram Language code:

Grid[{{Plot[c[x, "Probability" -> "A"], {x, 0, 5}, Exclusions -> None], Plot[c[x, "Probability" -> "B"], {x, 0, 5}, Exclusions -> None]}}, Frame -> All]

Train a predictor function on labeled data:

Wolfram Language code:

data = {-2.2 -> -0.5, 2.2 -> 0.7, -2.8 -> -0.7, 1. -> 0.60, -0.34 -> -0.4, 3.7 -> -0.6, -0.63 -> -0.8, -3.33 -> 0.1, 0.4 -> 0.4, 2.1 -> 0.8};

Wolfram Language code: p = Predict[data, Method -> "RandomForest"]

Compare the data with the predicted values and look at the standard deviation:

Wolfram Language code:

Show[Plot[{p[x], 
	p[x] + StandardDeviation[p[x, "Distribution"]], p[x] - StandardDeviation[p[x, "Distribution"]]}, 
	{x, -2, 6}, 
	PlotStyle -> {Blue, Gray, Gray}, 
	Filling -> {2 -> {3}}, 
	Exclusions -> False, 
	PerformanceGoal -> "Speed", PlotLegends -> {"Prediction", "Confidence Interval"}], ListPlot[List@@@data, PlotStyle -> Red, PlotLegends -> {"Data"}]]

Options (6)

"DistributionSmoothing" (2)

Train a classifier using the "DistributionSmoothing" suboption:

Wolfram Language code:

Classify[{-2.2 -> 1, 2.2 -> 1, -2.8 -> 2, 1. -> 2, -0.34 -> 3, 3.7 -> 4, -0.63 -> 4}, Method -> {"RandomForest", "DistributionSmoothing" -> 6}]

Use the "Titanic" training set to train a classifier with the default value of "DistributionSmoothing":

Wolfram Language code: data = ExampleData[{"MachineLearning", "Titanic"}, "TrainingData"];

Wolfram Language code: classifier = Classify[data, Method -> "RandomForest"];

Train a second classifier using a large "DistributionSmoothing":

Wolfram Language code: smoothed = Classify[data, Method -> {"RandomForest", "DistributionSmoothing" -> 60}]

Compare the probabilities for examples from a test set:

Wolfram Language code: testdata = ExampleData[{"MachineLearning", "Titanic"}, "TestData"];

Wolfram Language code:

sample = RandomSample[testdata, 4];
Dataset@<|"AutomaticClassifier" -> 
	classifier[sample[[All, 1]], "Probabilities"], 
	"SmoothedClassifier" -> smoothed[sample[[All, 1]], "Probabilities"]|>

"FeatureFraction" (2)

Train a predictor on high-dimensional data using the "FeatureFraction" suboption:

Wolfram Language code: Predict[RandomReal[1, {200, 200}] -> RandomReal[1, 200], Method -> {"RandomForest", "FeatureFraction" -> .3}]

In the "RandomForest" method, a balanced "FeatureFraction" prevents overfitting.

Use the "Titanic" training set to train two classifiers with different values of "FeatureFraction":

Wolfram Language code: data = ExampleData[{"MachineLearning", "Titanic"}, "TrainingData"];

Wolfram Language code: c1 = Classify[data, Method -> {"RandomForest", "FeatureFraction" -> 1}];

Wolfram Language code: c2 = Classify[data, Method -> {"RandomForest", "FeatureFraction" -> .6}]

Compare the accuracy of these classifiers on both the test set and the training set:

Wolfram Language code: testdata = ExampleData[{"MachineLearning", "Titanic"}, "TestData"];

Wolfram Language code:

Dataset@<|"c1" -> AssociationThread[{"TrainingSet", "TestSet"}, ClassifierMeasurements[c1, #, "Accuracy"]& /@ {data, testdata}] , 
	"c2" -> AssociationThread[{"TrainingSet", "TestSet"}, ClassifierMeasurements[c2, #, "Accuracy"]& /@ {data, testdata}]|>

"LeafSize" (1)

Use the "Titanic" training set to train two classifiers with different values of "LeafSize":

Wolfram Language code: data = ExampleData[{"MachineLearning", "Titanic"}, "TrainingData"];

Wolfram Language code: {c1, c2} = Classify[data, Method -> {"RandomForest", "LeafSize" -> #}]& /@ {1, 500};

Compare the size of the corresponding forests:

Wolfram Language code: ByteCount[#[[1, "Model", "Trees"]]]& /@ {c1, c2}

"TreeNumber" (1)

Use the "Mushroom" training set to train two classifiers with different values of "TreeNumber":

Wolfram Language code: data = ExampleData[{"MachineLearning", "Mushroom"}, "TrainingData"];

Wolfram Language code: {c1, c2} = Classify[data, Method -> {"RandomForest", "TreeNumber" -> #}]& /@ {3, 30};

Look at the training time of these classifiers:

Wolfram Language code: Information[#, "TrainingTime"]& /@ {c1, c2}

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

"RandomForest" (Machine Learning Method)

Details & Suboptions

Examples

Basic Examples (3)

Options (6)

"DistributionSmoothing" (2)

"FeatureFraction" (2)

"LeafSize" (1)

"TreeNumber" (1)

"RandomForest" (Machine Learning Method)

Details & Suboptions

Examples

Basic Examples (3)

Options (6)

"DistributionSmoothing" (2)

"FeatureFraction" (2)

"LeafSize" (1)

"TreeNumber" (1)

See Also

Related Links

History