"RandomForest" (Machine Learning Method)

Details & Suboptions

  • Random forest is an ensemble learning method for classification and regression that operates by constructing a multitude of decision trees. The forest prediction is obtained by taking the most common class or the mean-value tree predictions. Each decision tree is trained on a random subset of the training set and only uses a random subset of the features (bootstrap aggregating algorithm).
  • The following options can be given:
  • "DistributionSmoothing"0.5regularization parameter
    "FeatureFraction"Automaticthe fraction of features to be randomly selected to train each tree
    "LeafSize"Automaticthe maximum number of examples in each leaf
    "TreeNumber"Automaticthe number of trees in the forest
  • "FeatureFraction", "LeafSize" and "DistributionSmoothing" can be used to control overfitting.

Examples

open allclose all

Basic Examples  (3)

Train a predictor on labeled examples:

Obtain information about the predictor:

Predict a new example:

Train a classifier function on labeled examples:

Plot the probability that the class of an example is "A" or "B" as a function of the feature and compare them:

Train a predictor function on labeled data:

Compare the data with the predicted values and look at the standard deviation:

Options  (6)

"DistributionSmoothing"  (2)

Train a classifier using the "DistributionSmoothing" suboption:

Use the "Titanic" training set to train a classifier with the default value of "DistributionSmoothing":

Train a second classifier using a large "DistributionSmoothing":

Compare the probabilities for examples from a test set:

"FeatureFraction"  (2)

Train a predictor on high-dimensional data using the "FeatureFraction" suboption:

In the "RandomForest" method, a balanced "FeatureFraction" prevents overfitting.

Use the "Titanic" training set to train two classifiers with different values of "FeatureFraction":

Compare the accuracy of these classifiers on both the test set and the training set:

"LeafSize"  (1)

Use the "Titanic" training set to train two classifiers with different values of "LeafSize":

Compare the size of the corresponding forests:

"TreeNumber"  (1)

Use the "Mushroom" training set to train two classifiers with different values of "TreeNumber":

Look at the training time of these classifiers:

Introduced in 2014
 (10.0)
 |
Updated in 2017
 (11.2)