Ensemble Learning 18.10 p748

  • So far: Looked at learning methods in which a single hypothesis, chosen from a hypothesis space, is used to make predictions

  • Ensemble Learning: Select a collection, or ensemble, of hypotheses from the hypothesis space and combine their predictions.

  • Motivation: Consider we have an ensemble of K=5 hypotheses and suppose we combine their predictions using simple majority voting. For the ensemble to misclassify a new example, at least three of the five hypotheses have to misclassify it. That is, there is a lower chance of misclassification than with a single hypothesis.

    • Suppose each hypothesis \(h_k\) in the ensemble has an error of \(p\) (the probability that a randomly chosen example is misclassified by \(h_k\) is \(p\).

    • Suppose that the errors made by each hypothesis are independent.


Boosting

  • Weighted Training Set

    • Each example has an associated weight \(w_j \geq 0\).

    • The higher the weight, the higher the importance attached to it during the learning of a hypothesis.

  • Boosting

    • Begin at \(w_j=1\) for all examples. From this set, it generates the first hypothesis, \(h_1\).

    • Increase the weights for the misclassified examples and decrease the weights for the correctly classified examples to generate \(h_2\).

    • Continue this process until \(K\) hypothese are generated (where \(K\) is an input to the algorithm).

    • The final ensemble hypothesis is a weighted-majority combination of all \(K\) hypotheses, each weighted according to how well it performed on the training set.

  • ADABOOST

    • If the input learning algorithm \(L\) is a weak learning algorithm (\(L\) always return a hypothesis with accuracy that is slightly better than random guessing, \(50\,percent + \epsilon\).), then ADABOOST will return a hypothesis that classifies the training data perfectly for large enough \(K\).

  • Decision stumps: Original hypothesis space which are decision trees with just one test.