Sequential Decision Problems

Bellman Equation
- \(U(s) = R(s) + \gamma\, max_{a \in A(s)} \sum\limits_{s'} P(s'|\,s,a)U(s')\)
Value Iteration
- \(U_{i+1}(s) \leftarrow R(s) + \gamma \,max_{a\in A(s)} \sum\limits_{s'} P(s'|\,s,a)U_i(s')\)
Convergence of Value Iteration
- For the updates on all states; \(\delta = \) Maximum difference between old and new utility.
- Quit when \(\delta < \epsilon(1-\gamma)/\gamma\) where \(\gamma\) is the discount factor.