EECS 492 Chapter 18: Learning From Examples

An agent is

**learning**if it improves its performance on future tasks after making observations about the world.This chapter: From a collection of input-output pairs, learn a function that predicts the output for new inputs.

Task

Given a**training set**of N example input-output pairs.

\((x_1, y_1), (x_2, y_2), ... (x_N, y_N)\)

where each \(y_j\) was generated by an unkown function \(y = f(x)\), discover a function \(h\) that approximates the true function \(f\).Function \(h\): Hypothesis.

Learning is a search through space of possible hypotheses for one that performs well, even on new examples.

To measure accuracy of a hypothesis, we use a

**test set**of examples distinct from the training set.A hypothesis

**generalizes**if it correctly predits the value of \(y\) for novel examples.Sometimes \(f\) is stochastic-not strictly a function of \(x\)- which means that we have to learn a conditional probability distribution \(P(Y|x)\)

Types of Learning Problems

Classification: Type of learning problem for which the output \(y\) is one of a finite set of values (such as \(sunny\), \(cloudy\), or \(rainy\)).

Regression: Type of learning problem for which the output \(y\) is a number (such as temperature).

**Ockham’s Razor**: Choose the*simplest*hypothesis consistent with the data.“In general, there is a tradeoff between complex hypotheses that fit the training data well and simpler hypotheses that may generalize better.”

Supervised learning is done by choosing the hypothesis \(h^*\) that is most probably given the data:

\(h^*=argmax_{h\in H}\,P(h|data).\)

By Bayes: \(h^*=argmax_{h\in H}\,P(data|h)P(h).\)

\(P(h)\) is high for a degree 1/2 polynomial and low for a higher degree polynomial.

This block failed to display. Double-click this text to correct any errors. Your changes are saved.

So far: Looked at learning methods in which a single hypothesis, chosen from a hypothesis space, is used to make predictions

**Ensemble Learning**: Select a collection, or**ensemble**, of hypotheses from the hypothesis space and combine their predictions.Motivation: Consider we have an ensemble of K=5 hypotheses and suppose we combine their predictions using simple majority voting. For the ensemble to misclassify a new example,

*at least three of the five hypotheses have to misclassify it*. That is, there is a lower chance of misclassification than with a single hypothesis.Suppose each hypothesis \(h_k\) in the ensemble has an error of \(p\) (the probability that a randomly chosen example is misclassified by \(h_k\) is \(p\).

Suppose that the errors made by each hypothesis a

## Share on Social Media