Decision Learning

  • Information Gain

    • Entropy of a boolean variable that is true with probability q

      • \(B(q) = -(q \text{ log}_2q+(1-q)\text{ log}_2(1-q))\)

    • Entropy of the goal attribute on the whole set is \(B(\frac{p}{p+n})\)

    • \(\text{Gain}(A) = B(\frac{p}{p+n})-\text{Remainder}(A)\)

    • \(\text{Remainder}(A) = \sum\limits_{k=1}^{d}\frac{p_k+n_k}{p+n}\,B(\frac{p_k}{p_k+n_k})\) where each subset \(E_k\) has \(p_k\) positive examples and \(n_k\) negative examples.

  • Decision Trees