EECS492 Chapter 16: Decision Theory Notes

Decision Theory: Dealing with choice among actions based on the desirability of their *immediate* outcomes; environment is thus episodic.

\(P(RESULT(a) = s'|\,a,e)\)

RESULT(a): random variable whose values are possible outcome states, given action **a**.

Probability of outcome **s’**, given evidence observations **e**.

\(EU(a|e) = \sum\limits_{s'} P(RESULT(a) = s'|\,a,e)U(s')\)

U(s): Utility function, a single number that expresses desirability of a state.

Average utility value of the outcomes, weighted by probability of outcome occuring.

\(action = \textrm{argmax}\: EU(a|e)\)

**Maximum Expected Utility (MEU)**: Rational agent should choose the action that maximizes agent’s expected utility.

“If an agent acts so as to maximize a utility function that correctly reflects the performance measure, then the agent will achieve the highest possible performance score (averaged over all the possible environments.)”

Notation

\(A \succ B\) the agent prefers A over B

\(A \sim B\) the agent is indifferent between A and B

\(A \succeq B\) the agent prefers A over B or is indifferent between them.

A and B are not states, but a set out outcomes for each action–a **lottery**. A lottery \(L\) with possible outcomes \(S_1,...,S_n\) that occurs with probabilities \(p_1,...,p_n\): \(L = [p_1,S_1;\,p_2, S_2;\, ... p_n, S_n].\)

Each outcome \(S_i\) of a lottery can be either **an atomic state or another lottery**.

Preferences relations must require six constraints:

**Orderability**: Given any two lotteries, a rational agent must either prefer one to the other or rate them as equally preferable.

Exactly one of \(A \succ B\), \(A \sim B\), \(B \succ A\)**Transitivity**: Given any three lotteries, if an agent prefers A to B and prefers B to C, then the agent must prefer A to C**Continuity**: If some lottery B is between A and C in preference, then there is some probability p for which the rational agent will be indifferent between getting B for sure and the lottery that yields A with probability p and C with probability 1-p.**Substitutability**: If an agent is indifferent between two lotteries A and B, then the agent is indifferent between two more complex lotteries that are the same except B is substitued for A in one of them. (This holds regardsless of the probabilities and the other outcome(s) in the lotteries.**Monotinicity**: Suppose two lotteries have the same two possible outcomes, A and B. If an agent prefers A to B, then the agent must prefer the loterry that has a higher probability for A (and vice versa)**Decomposability**: Compound lotteries can be reduced to simpler ones using the laws of probability. “No fun in gambling” rule: two consecutive lotteries can be compressed into a single equivalent lottery.

\([p,A;\,1-p,[q,B;\,1-q,C]]\,\sim\,[p,A;\,(1-p)q,B;\,(1-p)(1-q),C].\)

**Existence of a Utility Function**: If an agent’s preferences obey the axioms of utility; then there exists a function \(U\) such that \(U(A) > U(B)\) if and only if A is preferred to B and U(A) = U(B) if and only iff the agent is indifferent between A and B.

\(U(A) > U(B) \Leftrightarrow A \succ B\)

\(U(A) = U(B) \Leftrightarrow A \sim B\)**Expected Utility of a Lottery**: The utility of a lottery is the sum of the probability of each outcome times the utility of that outcome.

\(U([p_1, S_1;...;p_n,S_n]) = \sum\limits_{i} p_iU(S_i).\)

A utility is a function that maps from lotteries to real numbers.

An agent can have any preferences that it wants; Preferences themselves cannot be irrational.

**Pereference Elicitation**: Process that involves presenting chocies to the agent and using the observed preferences to pin down the underlying utility function**Normalized Utility**: Establish a “best” utility and a “worst” utility. Normalized Utility use a scale with Worst = 0 and Best = 1.Use a

**Standard Lottery**\([p, util_{min};\,(1-p),util_{max}]\) to assess utility of any paticular prize \(S\). \(p\) is adjusted until the agent is indifferent between \(S\) and the standard lottery. Utility of \(S\) is given by \(p\).

Agents have a \(\textbf{monotonic preference}\) for more money. In other words, agents prefer more money to less.

ex. Given choice between $1,000,0000 or gamble a coin flip for $2,500,000

**Expected Monetary Value**: 1/2*($0) + 1/2($2,500,000) = $1,250,000To correctly make a decision, we must calculate the expected utilities of the two actions of accepting and declining the gamble. \(S_n\): state of possessing total wealth \(n\). current wealth is $k.

EU(Accept) = \(1/2*U(S_k) + 1/2*U(S_{k+2,500,000})\)

EU(Decline) = \(U(S_{k+1,000,000})\)

Utility is not directly proportional to monetary value: for example the first million is worth more than the rest. Grayson (1960) found that the utility of money was close to the

*logarithm*of the amount.**Risk Averse**: Agents that prefer a sure thing that is less than the expected value**Risk Seeking**: Agents that prefer to gamble for the expected value**Risk Neutral**: Agent that has a linear curve**Certainty Equivalent**: Value an agent will accept in lieu of a lottery.**Insurance Premium**: Difference between the Expected Monetary Value (EMV) and the certainty equivalent.

**Error**Expected value of Error: \(E(\widehat{EU}(a|e)-EU(a|e))\)

a: action, e: evidence

With error, we are choosing only positive estimates of expected utility because of the argmax.

With greater k (choices), we can generate a more precise utility estimate distribution.

A decision networks represents:

Information about agent’s current state

Possible actions

State that will result from the action

Utility of current state

Types of Nodes

**Chance Nodes**(ovals): Represent random variables. Each chance node is associated with a conditional distribution indexed by state of parent nodes. In decision networks, parent nodes can include decision nodes as well as chance nodes.**Decision nodes**(rectang;es): represent points where decision maker has a choice of actions. For example AirportSite can take on a different value for each site under consideration => these choices influence cost, safety, noise that will result. (this chapter deals only with one decision)**Utility nodes**(diamonds): Represent agent’s utility function. Has a description of agent’s utility as a function of all parent attributes. This could be a table or parameterized additive or linear function.Simplification/Compilation: Chance nodes describing outocome state are ommitted. Instead, utility node is connected directly to current-state nodes and the decision node.

Actions are selected by evaluating the decision network for each possible setting of the decision node.

Once the decision node is set, the ndoe behaves exactly like a chance node that has been set as an evidence variable.

Algorithm

Set the evidence variables for the current state

For each possible value of the decision node:

Set the decision node to that value

Calculate posterior probabilities for the parent nodes of the utility node, using a standard probabilistic inference algorithm

Calculate the resulting utility for the action

Return action with highest utility