Chapter 5 cont Lectures

Tree Diagrams

tree diagrams help us think through conditional probabilities by showing sequences of events as paths that look like branches of a tree.
We often make tree diagrams when reversing the conditioning
Suppose we want to know Prob(A \(/vert\) B), but we know only Prob(A), Prob(B) and Prob(B\(\vert\)A) We also know Prob(A and B), since P(A and B) = Prob(A) x Prob(B \(\vert\) A) From this information, we can find Prob(A \(\vert\) B) \(\textbf{Example of tree diagram question}\)
Assume there is a screening test for a certain cancer that is 95 percent accurate if someone has the cancer. Also assume that if someone doesn’t have the cancer, the test is positive just 1 percent of the time. Assume further that 0.5 percent actually have this type of cancer. What is the probability that someone who tested positive for this cancer does not actually have the cancer, i.e. what is the false positive rate? done in stats review notes.
you need bayes rule to get final answer:
\(Prob(No cancer \vert positive)=\frac{prob(positive and no cancer)}{prob(positive)}=0.68\)
so as a reminder by doing one of these tree diagrams remember that you will have to use bayes rules for the final answer. The numerater for the equation will be p(A and B). to find that using the tree it will be \(P(A and B)= p(A)*P(B \vert A)\)
HIV example is in the midterm review.

Chapter 6 Lectures

Probability Model And Distributions

A probability distribution or probability distribution function (pdf) is a table or graph that gives all the outcomes of a random experiment and their probabilities.

Discrete vs Continuous

A random variable is called discrete if the outcomes are values that can be listed or counted

number of classes taken
roll of a die

A random variable is called continuous if the outcomes cannot be listed because they occur over a range

time to finish an exam
exact weight

\(\textbf{Examples}\) the number of phone numbers in stored on your phone: discrete
the length of how long your next phone call will last:continuous
the weight of a sandwhich you are served at a deli:continuous
time from you leave your house to when you arrived to class:continuous
number of people in the next passing car:discrete
BAC of a driver pulled over by the police:continuous
number of eggs laid by a randomly selected salmon:discrete

Discrete Probability Distributions

common pdf for discrete data is with a table, one table is x and then p(x).

Continuous Probability Distribution Functions

these are represented by curves, think of a gaussian. The area under the curve between two values of x represents the probability of x being between those two points.
Total area is equal to 1.

Normal Model

The Normal Model is a good fit if:
The distribution is unimodal
The distribution is approximately symmetric The distribution is approximately bell shaped
A Normal distribution is defined by the mean(\(\mu\)) and standard deviation(\(\sigma\)) . Shorthand for a normal distribution is \[N(\mu,\sigma)\]
normal distribution also called gaussian or bell curve.

Standardizing wirth Z scores

Z-scores are used to compare individual data values to their mean relative to their standard deviation.
The formula for calculating the z-score of a data value is:
\[z=\frac{observed-predicted}{SD}\]

Z scores

Standardizing data into z-scores shifts the data by subtracting the mean and rescales the values by dividing by their standard deviation. This method shifts everything where you mean becomes the center of the normal model and by making the mean zero.
Standardizing into z-scores changes the spread by making the standard deviation 1

Shape, center, and spread of z-scores

Now when using z scores for normally distributed data with new mean of 0 and standard deviation of 1 then \(z\approx N(0,1)\)
Recall that z score gives us an indication of how unusual a value is because z score tells us how far we are away from the mean.
negative z score says your below the mean, a positive z score tells you your above the mean.
The larger the z score is, negative or positive, tells you how unusual it is.

Calculating percentiles and probabilities with normal models

z scores can tell us \(how\) unusual an observation is. This means that we start using z tables here.
\(\textbf{Example}\) ACT scores are distributed normally with mean 21 and standard deviation 5. If Adam got a 27 on his ACT, what is his percentile score?
calculate z score. \(z=\frac{27-21}{5}=1.2\). Looking up on a standard normal table, percentile is 0.8849.
so adam score is in the 88.49th percentile.
so looking up values on the table give you perentiles, as in scoring in some percentile. In switching to probabilities in the case of adam, the case of scoring below a 27 in 0.8849. Scoring higher is the compliment.
just looking up scores on the table give you probabilities to the left. 1 minus that value gives probability to the right of that point.
\(\textbf{Example}\)
ACT scores are distributed normally with mean 21 and standard deviation 5. What percent of scores fall between 28 and 19 on the ACT?
\(z_{28} score=\frac{28-21}{5}=1.4\) with probability 0.9192.
\(z_{19} score=\frac{19-21}{5}=-0.4\) with probability 0.3446
so the percent of scores between these values are (0.9192-0.3446)100=57.46\(\%\).
\(\textbf{Example}\) you can also work back wards and find the observed value give nthe percentile.
If SAT scores are N(1500,300) and if sophire scored at 76th percentile, what was her actual score?
76th percentile has an associated z score to it so find that in the table. thats at z= 0.71. then using the z score equation you can solve for observed as, \(oberserved=(300)(0.71)+1500=1713.\)
\(\textbf{Example}\)
Let’s assume SAT scores are N(1500, 300). Between what two scores do the middle 50 In the probability curve we have the middle 50\(\%\) that we want to cover. which means there is \(\frac{.50}{2}\) left on each side of the curve. which means we need to no find the scores the way we did above but at .25 percentile and the .75 percentile.

The Binomial Model

The binomial probability distribution is a discrete probability distribution function
Useful in many situations where you have numerical variables that are counts or whole numbers
Classic application of the binomial model is counting heads when flipping a coin
The binomial model provides probabilities for random experiments in which you are counting the number of successes that occur. Four characteristics must be present:

1)Fixed number of trials: n
2) The only two outcomes are success and failure
3) The probability of success, p, is the same at each trial
4) The trials are independent

Computing binomial Probabilities

The formula that finds the probabilities for the binomial distribution for probability of success p, fixed number of trials n, and k successes is as follows: \[{n \choose k}p^k(1-p)^{n-k}\] where \[{n \choose k}=\frac{n!}{k!(n-k)!}\]
\(\textbf{example}\)
A Stats 10 test has 4 multiple choice questions with four choices with one correct answer each. If we just randomly guess on each of the 4 questions, what is the probability that you get exactly 2 questions correct?
0.2109
If you calculating binomial probabilities and the ask for an at least question its p=1-P(0), where P is binomial probability.

Expected Value and Standard Deviation for binomials

The expected value or mean=np
standard deviation=\(\sqrt{np(1-p)}\)
The shape of the binomial distribution depends on both n and p.
Binomial distributions are symmetric when p = 0.5, but they are also symmetric when n is large, even if p is close to 0 or 1.
When np\(\geq 10\) and when n(1-p)\(\geq 10\) then we can approximate a binomial as a normal distribution with the same definitions of mean and standard deviation of the binomial.
After checking if you question meets the requirements to be approximated as normal then you already have the mean and standard deviation and you can use z scores to find probabilities.