Sequential Decision Problems

  • Bellman Equation

    • \(U(s) = R(s) + \gamma\, max_{a \in A(s)} \sum\limits_{s'} P(s'|\,s,a)U(s')\)

  • Value Iteration

    • \(U_{i+1}(s) \leftarrow R(s) + \gamma \,max_{a\in A(s)} \sum\limits_{s'} P(s'|\,s,a)U_i(s')\)

  • Convergence of Value Iteration

    • For the updates on all states; \(\delta = \) Maximum difference between old and new utility.

    • Quit when \(\delta < \epsilon(1-\gamma)/\gamma\) where \(\gamma\) is the discount factor.