Task
Given a training set of N example input-output pairs.
\((x_1, y_1), (x_2, y_2), ... (x_N, y_N)\)
where each \(y_j\) was generated by an unkown function \(y = f(x)\), discover a function \(h\) that approximates the true function \(f\).
Function \(h\): Hypothesis.
Learning is a search through space of possible hypotheses for one that performs well, even on new examples.
To measure accuracy of a hypothesis, we use a test set of examples distinct from the training set.
A hypothesis generalizes if it correctly predits the value of \(y\) for novel examples.
Sometimes \(f\) is stochastic-not strictly a function of \(x\)- which means that we have to learn a conditional probability distribution \(P(Y|x)\)
Types of Learning Problems
Classification: Type of learning problem for which the output \(y\) is one of a finite set of values (such as \(sunny\), \(cloudy\), or \(rainy\)).
Regression: Type of learning problem for which the output \(y\) is a number (such as temperature).
Ockham’s Razor: Choose the simplest hypothesis consistent with the data.
“In general, there is a tradeoff between complex hypotheses that fit the training data well and simpler hypotheses that may generalize better.”
Supervised learning is done by choosing the hypothesis \(h^*\) that is most probably given the data:
\(h^*=argmax_{h\in H}\,P(h|data).\)
By Bayes: \(h^*=argmax_{h\in H}\,P(data|h)P(h).\)
\(P(h)\) is high for a degree 1/2 polynomial and low for a higher degree polynomial.