Chapter 7 lectures

survey terminology

population-Group of people or objects we wish to study
Parameter-is a numerical value that characterizes some aspect of the population
Census-Survey in which every member of the population is measured
statistic-number that estimates a population parameter derived from the sample

population vs sample

population-collection of ALL data values.
population size= usually very large and often unknown. normally impossible to obtain all values.
measures that come from the population are \(\textbf{parameters}\)

sample is a subset of the population and we can measure characteristics.
sample size is the number of observations in a sample, n.
measures that come from the sample are called statistics

notation

Greek letters are used to represent population parameteres and latin letters to sample statistics. use \(\mu\) and \(\sigma\) for population, use \(\bar{x}\) and \(s\) for sample statistics.

Statistical Inference

statistical inference is the art and science of drawing conclusions about a population on the basis of observing only a small subset of that population
Always uncertainty in inference so we need to measure that uncertainty.

Sample Surveys

opinion polls are examples of sample surveys designed to ask question of a small group in hopes of learning something about the entire population
Pro pollsters want to ensure the sample they take is representative of the population//

Bias

a method is biased if it has a tendency to produce an untrue value.
\(\textbf{sampling bias}\) results from taking a sample that is not representative of the population
\(\textbf{Measurement bias}\) comes from asking questions that do not produce a true answer due to confusing wording or misleading questions
In a \(\textbf{Voluntary response sample}\) a large group of individuals are invited to respond, those who do are counted.
these almost always biased
In \(\textbf{Convenience Sampling}\) we only include the individuals that are convenient adn this may not be representative of the entire population.

non-response

\(\textbf{Non-response error}\) occurs when those who repsond may differ from those who do not.

questions to ask about bias

  • what percentage of people who were asked to participate actually did so?

  • did the researchers choose people to participate or did the peopel themselves choose to participant

  • did the researcher leave out whole segments of the population who are likely to answer the question differently from the rest of population?


Simple Random Sampling

involves randomly drawing people from the population withour replacement.
want to make sure every possible sampleof the size we plan to draw has an equal chance of being selected.

Accuracy and Precision, Bias and standard error

Bias is a measure of the accuracy.
\(\textbf{example}\)
If only basketball players are measured to estimate the proportion of americans who are taller than 6 feet then there is a bias for a larger proportion
Standard error is a measure of precision
for a standard random sample the bias is 0.
this is equivalent to saying that the mean of all the sample proportions equals the population proportion
Precision gets better with larger sample sizes.
The precision and bias are independent of the population size as long as the population size is \(10\) times larger than the sample size.

Sampling distributions

we take random samples of populations to make some inference about a population parameter
sample proportions are given as \(\hat{p}\) and variability can be expressed through a sampling distribution, which will give us a mean and a stardard deviaiton to describe \(\hat{p}\)

The Central Limit Theorem for sample proportions

if trials are random and independent and the sample and population sizes are large then the sampling distribution of \(\hat{P}\) is approximately normal and follows:
\[CLT:\hat{p}\approx N(mean(\hat{p})=p,SD(\hat{p})=\sqrt{\frac{p(1-p)}{n}})\]

conditions for CLT for sample proportions

  • random and independent

  • large sample: np\(\geq 10\) and n(1-p)\(\geq 10\)

  • Large population: population size needs to be bigger than 10 times the sample size

Finding probabilities with CLT

Ipod has 5000 songs. assume 15 percent are classical and we pick 100 for a playlist, what is the probability that no more than 10 percent of the songs will be classical?
Check:
np=(100)(.15)=15 so this works.
n(1-p)=100(.85)=85, this works too
large population
so \(\hat{p}=N(0.15,\sqrt{\frac{0.15*0.85}{100}})\) this reduces to a z score problem, \(z=\frac{.10-.15}{0.0357}=1.40\). so you can look this up on the z table and it p=0.0808.

Are sample proportions enough

knowing that there is some variability associated with out sampling procedure, why not provide an interval?
Its usually a sample proportion(\(\hat{p}\)), not the p( true population proportion) this is known.

standard deviation vs Standard error

since we often dont know p, we cant find the true standard deviation, so we use an estimate called the \(\textbf{standard error}\)
\[SE(\hat{p})=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]

Confidence Interval

Due to the empirical rule, there is a 95 percent chance that p is no more than 2 SE away from \(\hat{p}\)
if we reach out to 2 SEs, 95 percent sure that p will be in that interval. If we reach out 2 SEs in either direction of\(\hat{p}\), we can be 95 percent confident that this interval contains the true proportion, this is called a 95 percent \(\textbf{confidence interval}\)

Why do we calculate CIs?

Constructing a confidence interval is a way to estimate the true population proportion,p, when all we have is the sample proportion,\(\hat{p}\)
In statistics when we are estimating a population parameter we choose to provide an interval where we believe this parameter may be in, and we also accompany our estimate with a measure of how certain we are.
This called the margin of error(ME) to the sample statistic.
the confidence interval is then estimate \(\pm\) ME

Calculating ME and CI

the margin of error depends on two criteria:
how variable is \(\hat{p}\)
how confident do you want to be of your estimate
In general to calculate the confidence interval \[\hat{p} \pm z*SE(\hat{p})\]
where \(SE(\hat{p})=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\) and Z* is the critical Z score

Z score

the critical z is correlated with how confident we want to be on our measurememt error. so for 95 percent confidence you would look that up on a table and see that the correspong z score 1.96.
The majority of the time we will be using 95 percent confidence intervals.

what does 95 percent mean?

95 percent confident means that 95 percent of random samples will produce confidence intervals that include the true population proportion.
\(\textbf{Example}\)
In a random sample of 100 students, 35 have traveled outside the us. estimate the true population proportion of ucla students who travelled outside the US using a 95 percent confidence interval.
SE=\(\sqrt{\frac{0.35(0.65}{100}}=0.0477\)
ME=1.96(SE)=0.09
(0.35 \(\pm\) 0.09 ) is your confidence interval

Choosing Sample Size

you can find the sample size needed for a given confidence interval. \(ME=z*\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\) and you can solve for n.
with some algebra \(n=\frac{(z*)^2\hat{p}(1-\hat{p})}{ME^2}\) Special case of 95 percent is \(n=\frac{1}{ME^2}\)