EECS492 Chapter 16: Decision Theory Notes

Introduction to Decision Theory

Decision Theory: Dealing with choice among actions based on the desirability of their immediate outcomes; environment is thus episodic.

\(P(RESULT(a) = s'|\,a,e)\)
RESULT(a): random variable whose values are possible outcome states, given action a.
Probability of outcome s’, given evidence observations e.

\(EU(a|e) = \sum\limits_{s'} P(RESULT(a) = s'|\,a,e)U(s')\)
U(s): Utility function, a single number that expresses desirability of a state.
Average utility value of the outcomes, weighted by probability of outcome occuring.

\(action = \textrm{argmax}\: EU(a|e)\)
Maximum Expected Utility (MEU): Rational agent should choose the action that maximizes agent’s expected utility.

“If an agent acts so as to maximize a utility function that correctly reflects the performance measure, then the agent will achieve the highest possible performance score (averaged over all the possible environments.)”

Utility Theory

Constraints on Rational Preferences


  • \(A \succ B\) the agent prefers A over B

  • \(A \sim B\) the agent is indifferent between A and B

  • \(A \succeq B\) the agent prefers A over B or is indifferent between them.

A and B are not states, but a set out outcomes for each action–a lottery. A lottery \(L\) with possible outcomes \(S_1,...,S_n\) that occurs with probabilities \(p_1,...,p_n\): \(L = [p_1,S_1;\,p_2, S_2;\, ... p_n, S_n].\)
Each outcome \(S_i\) of a lottery can be either an atomic state or another lottery.
Preferences relations must require six constraints:

  1. Orderability: Given any two lotteries, a rational agent must either prefer one to the other or rate them as equally preferable.
    Exactly one of \(A \succ B\), \(A \sim B\), \(B \succ A\)

  2. Transitivity: Given any three lotteries, if an agent prefers A to B and prefers B to C, then the agent must prefer A to C

  3. Continuity: If some lottery B is between A and C in preference, then there is some probability p for which the rational agent will be indifferent between getting B for sure and the lottery that yields A with probability p and C with probability 1-p.

  4. Substitutability: If an agent is indifferent between two lotteries A and B, then the agent is indifferent between two more complex lotteries that are the same except B is substitued for A in one of them. (This holds regardsless of the probabilities and the other outcome(s) in the lotteries.

  5. Monotinicity: Suppose two lotteries have the same two possible outcomes, A and B. If an agent prefers A to B, then the agent must prefer the loterry that has a higher probability for A (and vice versa)

  6. Decomposability: Compound lotteries can be reduced to simpler ones using the laws of probability. “No fun in gambling” rule: two consecutive lotteries can be compressed into a single equivalent lottery.

Preferences lead to utility

  • Existence of a Utility Function: If an agent’s preferences obey the axioms of utility; then there exists a function \(U\) such that \(U(A) > U(B)\) if and only if A is preferred to B and U(A) = U(B) if and only iff the agent is indifferent between A and B.
    \(U(A) > U(B) \Leftrightarrow A \succ B\)
    \(U(A) = U(B) \Leftrightarrow A \sim B\)

  • Expected Utility of a Lottery: The utility of a lottery is the sum of the probability of each outcome times the utility of that outcome.
    \(U([p_1, S_1;...;p_n,S_n]) = \sum\limits_{i} p_iU(S_i).\)

Utility Functions

A utility is a function that maps from lotteries to real numbers.
An agent can have any preferences that it wants; Preferences themselves cannot be irrational.

Utility assessment and utility scales

  • Pereference Elicitation: Process that involves presenting chocies to the agent and using the observed preferences to pin down the underlying utility function

  • Normalized Utility: Establish a “best” utility and a “worst” utility. Normalized Utility use a scale with Worst = 0 and Best = 1.

  • Use a Standard Lottery \([p, util_{min};\,(1-p),util_{max}]\) to assess utility of any paticular prize \(S\). \(p\) is adjusted until the agent is indifferent between \(S\) and the standard lottery. Utility of \(S\) is given by \(p\).