OK.

Now you have your data in R and you can check the names of the column headers.

For your homework you need to report the following summary statistics on a per genus base:

  1. Number of observations

  2. Mean

  3. Median

  4. Variance

  5. Standard deviation

  6. Standard error of the mean

  7. Range (Min and Max)

  8. First and third quartile, which is the same to say 25th and 75th percentile

First, lets just calculate everything for the entire dataset. For this you use the command >summary() passing the object storing your dataset as an argument. There is a trick here, you have to tell summary() which column you want to summarize, otherwise it will do it for all the columns in you matrix. You can achieve this using $ like this homework_1_data$L refers to column with header “L".

If I summarize the dataset columns “L" and “H", i.e. I type summary(homework_1_data$L) and summary(homework_1_data$H) I get the following results:

> summary(homework_1_data$L) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.2610 0.4760 0.5700 0.6106 0.7360 1.1720 > summary(homework_1_data$H) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.1610 0.2600 0.3250 0.3574 0.4530 0.7960

This leaves only the following statistics to be calculated:

  1. Number of observations

  2. Variance

  3. Standard deviation

  4. Standard error of the mean

You can calculate the variance and the standard deviation using the commands var() and sd() . I get:

> var(homework_1_data$L) [1] 0.03787815 > var(homework_1_data$H) [1] 0.0163248 > sd(homework_1_data$H) [1] 0.1277686 > sd(homework_1_data$L) [1] 0.1946231

Now you only need to get the following:

  1. Number of observations

  2. Standard error of the mean

Believe it or not, you cannot easily get the number of observations in you data matrix in R. There are reasons for this, for example, there maybe missing data in some columns, etc. The Standard error of the mean you can derive from the standard deviation by dividing it the square root of the sample size. For these two, you are kind of in troubles. The good part is that typically you only need to know these things in the context of an analysis and in that context these values are reported.

A very simple workaround is to use the command nrow on the entire dataset, but be warned (!) you are counting all missing data in the column of interest as data points. Now if I just do that quick and dirty trick I get the following:

> nrow(homework_1_data) [1] 97

And I can calculate the standard error of the mean by doing:

> sd(homework_1_data$L)/sqrt(nrow(homework_1_data)) [1] 0.01976098 > sd(homework_1_data$H)/sqrt(nrow(homework_1_data)) [1] 0.01297293