Random forest
Appearance
In machine learning, a Random Forest is a classifier that consists of many decision trees and outputs the class that is the mode of the classes output by individual trees. The algorithm for inducing a random forest was developed by Leo Breiman and Adele Cutler, and "Random Forests" is their trademark.
Each tree is constructed using the following algorithm:
- Let the number of training cases be N, and the number of variables in the classifier be M.
- We are told the number m of input variables to be used to determine the decision at a node of the tree; m should be much less than M.
- Choose a training set for this tree by choosing N times with replacement from all N available training cases (i.e. take a bootstrap sample). Use the rest of the cases to estimate the error of the tree, by predicting their classes.
- For each node of the tree, randomly choose m variables on which to base the decision at that node. Calculate the best split based on these m variables in the training set.
- Each tree is fully grown and not pruned (as may be done in constructing a normal tree classifier).
The advantages of random forest are:
- For many data sets, it produces a highly accurate classifier.
- It handles a very large number of input variables.
- It estimates the importance of variables in determining classification.
- It generates an internal unbiased estimate of the generalization error as the forest building progresses.
- It includes a good method for estimating missing data and maintains accuracy when a large proportion of the data are missing.
- It provides an experimental way to detect variable interactions.
- It can balance error in class population unbalanced data sets.
- It computes proximities between cases, useful for clustering, detecting outliers, and (by scaling) visualizing the data.
- Using the above, it can be extended to unlabeled data, leading to unsupervised clustering, outlier detection and data views.
- Learning is fast.
External links
- Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1), 5-32 (Original Article)
- Random Forest classifier description (Site of Leo Breiman)
- R News (2002) Vol. 2/3 p. 18 (Discussion of the use of the random forest package for R)