# Introduction to Decision Theory

Decision Theory: Dealing with choice among actions based on the desirability of their immediate outcomes; environment is thus episodic.

$$P(RESULT(a) = s'|\,a,e)$$
RESULT(a): random variable whose values are possible outcome states, given action a.
Probability of outcome s’, given evidence observations e.

$$EU(a|e) = \sum\limits_{s'} P(RESULT(a) = s'|\,a,e)U(s')$$
U(s): Utility function, a single number that expresses desirability of a state.
Average utility value of the outcomes, weighted by probability of outcome occuring.

$$action = \textrm{argmax}\: EU(a|e)$$
Maximum Expected Utility (MEU): Rational agent should choose the action that maximizes agent’s expected utility.

“If an agent acts so as to maximize a utility function that correctly reflects the performance measure, then the agent will achieve the highest possible performance score (averaged over all the possible environments.)”

# Utility Theory

## Constraints on Rational Preferences

Notation

• $$A \succ B$$ the agent prefers A over B

• $$A \sim B$$ the agent is indifferent between A and B

• $$A \succeq B$$ the agent prefers A over B or is indifferent between them.

A and B are not states, but a set out outcomes for each action–a lottery. A lottery $$L$$ with possible outcomes $$S_1,...,S_n$$ that occurs with probabilities $$p_1,...,p_n$$: $$L = [p_1,S_1;\,p_2, S_2;\, ... p_n, S_n].$$
Each outcome $$S_i$$ of a lottery can be either an atomic state or another lottery.
Preferences relations must require six constraints:

1. Orderability: Given any two lotteries, a rational agent must either prefer one to the other or rate them as equally preferable.
Exactly one of $$A \succ B$$, $$A \sim B$$, $$B \succ A$$

2. Transitivity: Given any three lotteries, if an agent prefers A to B and prefers B to C, then the agent must prefer A to C

3. Continuity: If some lottery B is between A and C in preference, then there is some probability p for which the rational agent will be indifferent between getting B for sure and the lottery that yields A with probability p and C with probability 1-p.

4. Substitutability: If an agent is indifferent between two lotteries A and B, then the agent is indifferent between two more complex lotteries that are the same except B is substitued for A in one of them. (This holds regardsless of the probabilities and the other outcome(s) in the lotteries.

5. Monotinicity: Suppose two lotteries have the same two possible outcomes, A and B. If an agent prefers A to B, then the agent must prefer the loterry that has a higher probability for A (and vice versa)

6. Decomposability: Compound lotteries can be reduced to simpler ones using the laws of probability. “No fun in gambling” rule: two consecutive lotteries can be compressed into a single equivalent lottery.
$$[p,A;\,1-p,[q,B;\,1-q,C]]\,\sim\,[p,A;\,(1-p)q,B;\,(1-p)(1-q),C].$$

## Preferences lead to utility

• Existence of a Utility Function: If an agent’s preferences obey the axioms of utility; then there exists a function $$U$$ such that $$U(A) > U(B)$$ if and only if A is preferred to B and U(A) = U(B) if and only iff the agent is indifferent between A and B.
$$U(A) > U(B) \Leftrightarrow A \succ B$$
$$U(A) = U(B) \Leftrightarrow A \sim B$$

• Expected Utility of a Lottery: The utility of a lottery is the sum of the probability of each outcome times the utility of that outcome.
$$U([p_1, S_1;...;p_n,S_n]) = \sum\limits_{i} p_iU(S_i).$$

# Utility Functions

A utility is a function that maps from lotteries to real numbers.
An agent can have any preferences that it wants; Preferences themselves cannot be irrational.

## Utility assessment and utility scales

• Pereference Elicitation: Process that involves presenting chocies to the agent and using the observed preferences to pin down the underlying utility function

• Normalized Utility: Establish a “best” utility and a “worst” utility. Normalized Utility use a scale with Worst = 0 and Best = 1.

• Use a Standard Lottery $$[p, util_{min};\,(1-p),util_{max}]$$ to assess utility of any paticular prize $$S$$. $$p$$ is adjusted until the agent is indifferent between $$S$$ and the standard lottery. Utility of $$S$$ is given by $$p$$.