Ensemble Learning 18.10 p748

So far: Looked at learning methods in which a single hypothesis, chosen from a hypothesis space, is used to make predictions

Ensemble Learning: Select a collection, or ensemble, of hypotheses from the hypothesis space and combine their predictions.
Motivation: Consider we have an ensemble of K=5 hypotheses and suppose we combine their predictions using simple majority voting. For the ensemble to misclassify a new example, at least three of the five hypotheses have to misclassify it. That is, there is a lower chance of misclassification than with a single hypothesis.
- Suppose each hypothesis \(h_k\) in the ensemble has an error of \(p\) (the probability that a randomly chosen example is misclassified by \(h_k\) is \(p\).
- Suppose that the errors made by each hypothesis are independent.

Boosting

Weighted Training Set
- Each example has an associated weight \(w_j \geq 0\).
- The higher the weight, the higher the importance attached to it during the learning of a hypothesis.
Boosting
- Begin at \(w_j=1\) for all examples. From this set, it generates the first hypothesis, \(h_1\).
- Increase the weights for the misclassified examples and decrease the weights for the correctly classified examples to generate \(h_2\).
- Continue this process until \(K\) hypothese are generated (where \(K\) is an input to the algorithm).
- The final ensemble hypothesis is a weighted-majority combination of all \(K\) hypotheses, each weighted according to how well it performed on the training set.
ADABOOST
- If the input learning algorithm \(L\) is a weak learning algorithm (\(L\) always return a hypothesis with accuracy that is slightly better than random guessing, \(50\,percent + \epsilon\).), then ADABOOST will return a hypothesis that classifies the training data perfectly for large enough \(K\).
Decision stumps: Original hypothesis space which are decision trees with just one test.