EECS 492 Review Material


  • GraphPlan

    • Mutex Conditions

      • Inconsistent Effects: The effects of one negates the effects of the other

      • Complements: (Literal) A and !A

      • Interference: One deletes the pre-conditions of another

      • Inconsistent Support: (Literal) Actions are mutex

      • Competing Needs: Mutually exclusive preconditions

  • Partial Order Planning

    • Setup

      • Steps

        • Add Start (with postconditions) and Finish (with preconditions)

      • Causal Links

      • Ordering Constraints

Bayesian Networks

  • Querying Information

    • \(Pr(x_1,...x_n)=\prod\limits_{i=1}^{n} Pr(x_i|\,parents(X_i))\)

    • For probabilities not specified, we need to take the sum of all possibilities inside the products.

  • Conditional Independence

    • A node is conditionally independent of its other predecessors, given its parents.

    • Each variable is conditionally independent of its non-descendents, given its parents

    • Markov blanket: A node is conditionally independent of all other nodes in the blanket, given its parents, children, and children’s parents.

    • Three Conditions for all paths

      • Common parent known

      • Middle of chain known

      • Common child unknown

  • Bayes rule: flip

Decision Learning

  • Information Gain

    • Entropy of a boolean variable that is true with probability q

      • \(B(q) = -(q \text{ log}_2q+(1-q)\text{ log}_2(1-q))\)

    • Entropy of the goal attribute on the whole set is \(B(\frac{p}{p+n})\)

    • \(\text{Gain}(A) = B(\frac{p}{p+n})-\text{Remainder}(A)\)

    • \(\text{Remainder}(A) = \sum\limits_{k=1}^{d}\frac{p_k+n_k}{p+n}\,B(\frac{p_k}{p_k+n_k})\) where each subset \(E_k\) has \(p_k\) positive examples and \(n_k\) negative examples.

  • Decision Trees

Sequential Decision Problems

  • Bellman Equation

    • \(U(s) = R(s) + \gamma\, max_{a \in A(s)} \sum\limits_{s'} P(s'|\,s,a)U(s')\)

  • Value Iteration

    • \(U_{i+1}(s) \leftarrow R(s) + \gamma \,max_{a\in A(s)} \sum\limits_{s'} P(s'|\,s,a)U_i(s')\)

  • Convergence of Value Iteration

    • For the updates on all states; \(\delta = \) Maximum difference between old and new utility.

    • Quit when \(\delta < \epsilon(1-\gamma)/\gamma\) where \(\gamma\) is the discount factor.

Reinforcement Learning
Dont’ know transition model \((P'|s,a)\) or reward function \(R(s)\) for each state.
Markov Decision Process: State at time t depends on only previous state and action
\(\text{Pr}(s_{t+1}=s'|\,s_t=s,\,a_t=a) = T(s,a,s')\)

  • Direct Estimation

    • Given trials, reward functions

    • Utility of a state is the sum_of_total_rewards/number_of_trials_it_is_in

  • Adaptive Dynamic Programming (ADP)

  • Active ADP

  • Temporal Difference Learning (TD)

    • Look forward one step to figure out utilities.

  • Q-Learning