Homework 2

Demo

OK. During the lecture I mentioned I was going to give you an R script with an animation to demo what is happening when we test Ho. First, remember that the distribution associated with a test statistic comes from assuming Ho is correct, or better said, cannot be rejected. Second remember the thought experiment we did when we wanted to understand where the standard error of the mean comes from. We are going to repeat that and then we will continue with the distribution of the test statistic. I went like this:

I send you over and over to the same forest to measure the diameter at chest height of \(n\) oak trees. Everytime you go and comeback with the measurements I say you have to calculate the mean of your sample.

Eventually you will accumulate many mean values, one for each sample you took. You data will look something like:

Sample Calculated mean
1 30.3
2 28.7
3 29.2
. .
. .
. .
\(n\) 32.8

With this sample of sample means you can calculate the sampling ditribution of the mean and see how it looks like. Let’s do this thought exercise in silico, in R.

Generating samples in R

You can generate a sample of size \(n\) in R using a number of functions to sample from different probability distributions. This means, you can simulate what would happen if you sample from a population which variable of interest (which is a random variable) is normally or poisson or binomial distributed, etc.

To gather a sample from a normal distribution you need the method rnorm() . If you want one sample of size 30 that comes from a normal distribution with mean 30 and standard deviation 12 you type (and get):

>rnorm(30,30,12) [1] 32.0630862 2.5455513 8.7773759 18.8421315 4.7099737 27.7924284 [7] 31.8720324 48.8105854 20.9649556 18.7402612 18.0561054 16.8817670 [13] 32.2856895 40.4021944 30.8985218 15.3042285 36.1146812 48.2984814 [19] 56.3747680 36.6624128 37.3735503 30.1495664 27.4083394 32.0865640 [25] 23.4841123 30.9359746 23.7207280 41.6553560 14.8833778 -0.5735307

If you type the command again you are going to get a different samples from the same distribution. If you want more values you type 1000 or 10000 or whatever you want.

With this method we simulate going out to the field to sample 30 trees from a population with a mean 30cm and standard deviation 12cm. Now we want to simulate going over and over to the field to sample. Essentially we have to repeat this an arbitrary number \(m\) of times. We don’t have to type the same command \(m\) times because in R you can program this; we let the computer do what it can do very well: repeat things over and over. So we type:

>for(i in seq(1,10,1)){ +rnorm(30,30,12) +}

This will create a sequence from 1 to 10 with jumps of size 1 (i.e. 1,2,3,4,5,6,7,8,9,10) and for each \(i\) in this list it will gather a sample of 30 values from a normal distribution with mean 30 and standard deviation 12.

So far so good, but the data is going nowhere. We need to store each sample somewhere. Or better yet, we store the mean of each sample somewhere. So, if you type:

>sampleOfMeans<-c() >for(i in seq(1,10,1)){ +sampleOfMeans<-rbind(sampleOfMeans, mean(rnorm(30,30,12))) +} >sampleOfMeans [1,] 34.12228 [2,] 26.48039 [3,] 27.69689 [4,] 30.71665 [5,] 26.70854 [6,] 31.93827 [7,] 30.33785 [8,] 25.86912 [9,] 31.96025 [10,] 30.52075

Type hist(sampleOfMeans) and you’ll get an histogram of the means, this is the sampling distribution of the mean. Change the second parameter (the second number) in the seq() method to 100, 1000, 10000 and 100000 and see what happens to the histogram. We are simulating going to the field 100, 1000, 10000, 100000 times to measure 30 trees from our virtual oak population.