ROUGH DRAFT authorea.com/6503

# Planning

• GraphPlan

• Mutex Conditions

• Inconsistent Effects: The effects of one negates the effects of the other

• Complements: (Literal) A and !A

• Interference: One deletes the pre-conditions of another

• Inconsistent Support: (Literal) Actions are mutex

• Competing Needs: Mutually exclusive preconditions

• Partial Order Planning

• Setup

• Steps

• Add Start (with postconditions) and Finish (with preconditions)

• Ordering Constraints

# Bayesian Networks

• Querying Information

• $$Pr(x_1,...x_n)=\prod\limits_{i=1}^{n} Pr(x_i|\,parents(X_i))$$

• For probabilities not specified, we need to take the sum of all possibilities inside the products.

• Conditional Independence

• A node is conditionally independent of its other predecessors, given its parents.

• Each variable is conditionally independent of its non-descendents, given its parents

• Markov blanket: A node is conditionally independent of all other nodes in the blanket, given its parents, children, and children’s parents.

• Three Conditions for all paths

• Common parent known

• Middle of chain known

• Common child unknown

• Bayes rule: flip

# Decision Learning

• Information Gain

• Entropy of a boolean variable that is true with probability q

• $$B(q) = -(q \text{ log}_2q+(1-q)\text{ log}_2(1-q))$$

• Entropy of the goal attribute on the whole set is $$B(\frac{p}{p+n})$$

• $$\text{Gain}(A) = B(\frac{p}{p+n})-\text{Remainder}(A)$$

• $$\text{Remainder}(A) = \sum\limits_{k=1}^{d}\frac{p_k+n_k}{p+n}\,B(\frac{p_k}{p_k+n_k})$$ where each subset $$E_k$$ has $$p_k$$ positive examples and $$n_k$$ negative examples.

• Decision Trees

# Sequential Decision Problems

• Bellman Equation

• $$U(s) = R(s) + \gamma\, max_{a \in A(s)} \sum\limits_{s'} P(s'|\,s,a)U(s')$$

• Value Iteration

• $$U_{i+1}(s) \leftarrow R(s) + \gamma \,max_{a\in A(s)} \sum\limits_{s'} P(s'|\,s,a)U_i(s')$$

• Convergence of Value Iteration

• For the updates on all states; $$\delta =$$ Maximum difference between old and new utility.

• Quit when $$\delta < \epsilon(1-\gamma)/\gamma$$ where $$\gamma$$ is the discount factor.

# Reinforcement Learning

https://www.dropbox.com/s/s7rl78omhzjj4sq/Screenshot%202014-04-25%2015.47.28.png
Dont’ know transition model $$(P'|s,a)$$ or reward function $$R(s)$$ for each state.
Markov Decision Process: State at time t depends on only previous state and action
$$\text{Pr}(s_{t+1}=s'|\,s_t=s,\,a_t=a) = T(s,a,s')$$

• Direct Estimation

• Given trials, reward functions

• Utility of a state is the sum_of_total_rewards/number_of_trials_it_is_in