Information Gain
Entropy of a boolean variable that is true with probability q
\(B(q) = -(q \text{ log}_2q+(1-q)\text{ log}_2(1-q))\)
Entropy of the goal attribute on the whole set is \(B(\frac{p}{p+n})\)
\(\text{Gain}(A) = B(\frac{p}{p+n})-\text{Remainder}(A)\)
\(\text{Remainder}(A) = \sum\limits_{k=1}^{d}\frac{p_k+n_k}{p+n}\,B(\frac{p_k}{p_k+n_k})\) where each subset \(E_k\) has \(p_k\) positive examples and \(n_k\) negative examples.
Decision Trees