ROUGH DRAFT authorea.com/6503
Main Data History
Export
Show Index Toggle 0 comments
  •  Quick Edit
  • EECS 492 Review Material

    Planning

    • GraphPlan

      • Mutex Conditions

        • Inconsistent Effects: The effects of one negates the effects of the other

        • Complements: (Literal) A and !A

        • Interference: One deletes the pre-conditions of another

        • Inconsistent Support: (Literal) Actions are mutex

        • Competing Needs: Mutually exclusive preconditions

    • Partial Order Planning

      • Setup

        • Steps

          • Add Start (with postconditions) and Finish (with preconditions)

        • Causal Links

        • Ordering Constraints

    Bayesian Networks

    • Querying Information

      • \(Pr(x_1,...x_n)=\prod\limits_{i=1}^{n} Pr(x_i|\,parents(X_i))\)

      • For probabilities not specified, we need to take the sum of all possibilities inside the products.

    • Conditional Independence

      • A node is conditionally independent of its other predecessors, given its parents.

      • Each variable is conditionally independent of its non-descendents, given its parents

      • Markov blanket: A node is conditionally independent of all other nodes in the blanket, given its parents, children, and children’s parents.

      • Three Conditions for all paths

        • Common parent known

        • Middle of chain known

        • Common child unknown

    • Bayes rule: flip

    Decision Learning

    • Information Gain

      • Entropy of a boolean variable that is true with probability q

        • \(B(q) = -(q \text{ log}_2q+(1-q)\text{ log}_2(1-q))\)

      • Entropy of the goal attribute on the whole set is \(B(\frac{p}{p+n})\)

      • \(\text{Gain}(A) = B(\frac{p}{p+n})-\text{Remainder}(A)\)

      • \(\text{Remainder}(A) = \sum\limits_{k=1}^{d}\frac{p_k+n_k}{p+n}\,B(\frac{p_k}{p_k+n_k})\) where each subset \(E_k\) has \(p_k\) positive examples and \(n_k\) negative examples.

    • Decision Trees

    Sequential Decision Problems

    • Bellman Equation

      • \(U(s) = R(s) + \gamma\, max_{a \in A(s)} \sum\limits_{s'} P(s'|\,s,a)U(s')\)

    • Value Iteration

      • \(U_{i+1}(s) \leftarrow R(s) + \gamma \,max_{a\in A(s)} \sum\limits_{s'} P(s'|\,s,a)U_i(s')\)

    • Convergence of Value Iteration

      • For the updates on all states; \(\delta = \) Maximum difference between old and new utility.

      • Quit when \(\delta < \epsilon(1-\gamma)/\gamma\) where \(\gamma\) is the discount factor.

    Reinforcement Learning

    https://www.dropbox.com/s/s7rl78omhzjj4sq/Screenshot%202014-04-25%2015.47.28.png
    Dont’ know transition model \((P'|s,a)\) or reward function \(R(s)\) for each state.
    Markov Decision Process: State at time t depends on only previous state and action
    \(\text{Pr}(s_{t+1}=s'|\,s_t=s,\,a_t=a) = T(s,a,s')\)

    • Direct Estimation

      • Given trials, reward functions

      • Utility of a state is the sum_of_total_rewards/number_of_trials_it_is_in

    • Adaptive Dynamic Programming (ADP)

    • Active ADP

    • Temporal Difference Learning (TD)

      • Look forward one step to figure out utilities.

    • Q-Learning