WORKING DRAFT authorea.com/6087

# Introduction to Decision Theory

Decision Theory: Dealing with choice among actions based on the desirability of their immediate outcomes; environment is thus episodic.

$$P(RESULT(a) = s'|\,a,e)$$
RESULT(a): random variable whose values are possible outcome states, given action a.
Probability of outcome s’, given evidence observations e.

$$EU(a|e) = \sum\limits_{s'} P(RESULT(a) = s'|\,a,e)U(s')$$
U(s): Utility function, a single number that expresses desirability of a state.
Average utility value of the outcomes, weighted by probability of outcome occuring.

$$action = \textrm{argmax}\: EU(a|e)$$
Maximum Expected Utility (MEU): Rational agent should choose the action that maximizes agent’s expected utility.

“If an agent acts so as to maximize a utility function that correctly reflects the performance measure, then the agent will achieve the highest possible performance score (averaged over all the possible environments.)”

# Utility Theory

## Constraints on Rational Preferences

Notation

• $$A \succ B$$ the agent prefers A over B

• $$A \sim B$$ the agent is indifferent between A and B

• $$A \succeq B$$ the agent prefers A over B or is indifferent between them.

A and B are not states, but a set out outcomes for each action–a lottery. A lottery $$L$$ with possible outcomes $$S_1,...,S_n$$ that occurs with probabilities $$p_1,...,p_n$$: $$L = [p_1,S_1;\,p_2, S_2;\, ... p_n, S_n].$$
Each outcome $$S_i$$ of a lottery can be either an atomic state or another lottery.
Preferences relations must require six constraints:

1. Orderability: Given any two lotteries, a rational agent must either prefer one to the other or rate them as equally preferable.
Exactly one of $$A \succ B$$, $$A \sim B$$, $$B \succ A$$

2. Transitivity: Given any three lotteries, if an agent prefers A to B and prefers B to C, then the agent must prefer A to C

3. Continuity: If some lottery B is between A and C in preference, then there is some probability p for which the rational agent will be indifferent between getting B for sure and the lottery that yields A with probability p and C with probability 1-p.

4. Substitutability: If an agent is indifferent between two lotteries A and B, then the agent is indifferent between two more complex lotteries that are the same except B is substitued for A in one of them. (This holds regardsless of the probabilities and the other outcome(s) in the lotteries.

5. Monotinicity: Suppose two lotteries have the same two possible outcomes, A and B. If an agent prefers A to B, then the agent must prefer the loterry that has a higher probability for A (and vice versa)

6. Decomposability: Compound lotteries can be reduced to simpler ones using the laws of probability. “No fun in gambling” rule: two consecutive lotteries can be compressed into a single equivalent lottery.
$$[p,A;\,1-p,[q,B;\,1-q,C]]\,\sim\,[p,A;\,(1-p)q,B;\,(1-p)(1-q),C].$$

• Existence of a Utility Function: If an agent’s preferences obey the axioms of utility; then there exists a function $$U$$ such that $$U(A) > U(B)$$ if and only if A is preferred to B and U(A) = U(B) if and only iff the agent is indifferent between A and B.
$$U(A) > U(B) \Leftrightarrow A \succ B$$
$$U(A) = U(B) \Leftrightarrow A \sim B$$

• Expected Utility of a Lottery: The utility of a lottery is the sum of the probability of each outcome times the utility of that outcome.
$$U([p_1, S_1;...;p_n,S_n]) = \sum\limits_{i} p_iU(S_i).$$

# Utility Functions

A utility is a function that maps from lotteries to real numbers.
An agent can have any preferences that it wants; Preferences themselves cannot be irrational.

## Utility assessment and utility scales

• Pereference Elicitation: Process that involves presenting chocies to the agent and using the observed preferences to pin down the underlying utility function

• Normalized Utility: Establish a “best” utility and a “worst” utility. Normalized Utility use a scale with Worst = 0 and Best = 1.

• Use a Standard Lottery $$[p, util_{min};\,(1-p),util_{max}]$$ to assess utility of any paticular prize $$S$$. $$p$$ is adjusted until the agent is indifferent between $$S$$ and the standard lottery. Utility of $$S$$ is given by $$p$$.

## The utility of money

• Agents have a $$\textbf{monotonic preference}$$ for more money. In other words, agents prefer more money to less.

• ex. Given choice between $1,000,0000 or gamble a coin flip for$2,500,000

• Expected Monetary Value: 1/2*($0) + 1/2($2,500,000) = $1,250,000 • To correctly make a decision, we must calculate the expected utilities of the two actions of accepting and declining the gamble. $$S_n$$: state of possessing total wealth $$n$$. current wealth is$k.

• EU(Accept) = $$1/2*U(S_k) + 1/2*U(S_{k+2,500,000})$$

• EU(Decline) = $$U(S_{k+1,000,000})$$

• Utility is not directly proportional to monetary value: for example the first million is worth more than the rest. Grayson (1960) found that the utility of money was close to the logarithm of the amount.

• Risk Averse: Agents that prefer a sure thing that is less than the expected value

• Risk Seeking: Agents that prefer to gamble for the expected value

• Risk Neutral: Agent that has a linear curve

• Certainty Equivalent: Value an agent will accept in lieu of a lottery.

• Insurance Premium: Difference between the Expected Monetary Value (EMV) and the certainty equivalent.

• Error

• Expected value of Error: $$E(\widehat{EU}(a|e)-EU(a|e))$$

• a: action, e: evidence

• With error, we are choosing only positive estimates of expected utility because of the argmax.

• With greater k (choices), we can generate a more precise utility estimate distribution.

# Decision Networks, p626

## Representing a decision problem with a decision network

• A decision networks represents:

• Information about agent’s current state

• Possible actions

• State that will result from the action

• Utility of current state

• Types of Nodes

• Chance Nodes (ovals): Represent random variables. Each chance node is associated with a conditional distribution indexed by state of parent nodes. In decision networks, parent nodes can include decision nodes as well as chance nodes.

• Decision nodes (rectang;es): represent points where decision maker has a choice of actions. For example AirportSite can take on a different value for each site under consideration => these choices influence cost, safety, noise that will result. (this chapter deals only with one decision)

• Utility nodes (diamonds): Represent agent’s utility function. Has a description of agent’s utility as a function of all parent attributes. This could be a table or parameterized additive or linear function.

• Simplification/C