Decision Learning

Information Gain
- Entropy of a boolean variable that is true with probability q
  - \(B(q) = -(q \text{ log}_2q+(1-q)\text{ log}_2(1-q))\)
- Entropy of the goal attribute on the whole set is \(B(\frac{p}{p+n})\)
- \(\text{Gain}(A) = B(\frac{p}{p+n})-\text{Remainder}(A)\)
- \(\text{Remainder}(A) = \sum\limits_{k=1}^{d}\frac{p_k+n_k}{p+n}\,B(\frac{p_k}{p_k+n_k})\) where each subset \(E_k\) has \(p_k\) positive examples and \(n_k\) negative examples.
Decision Trees