Authorea

John Sherrill added Jan 16.tex over 9 years ago

Commit id: 9618968f888e89e7ffafc8a2a4e44e887620924a

deletions | additions

The lecture covered part of section 1.1 in the book: \begin{itemize} \item What is statistics? The study of data. \item What is data? Data is tables. \item What are tables? Tables are rows and columns. \item What are rows? Rows are units, the subjects of inquiry. \item What are columns? Columns are variables, the characteristics of the units. \end{itemize} In class we came up with two little examples for data: \begin{center} \begin{tabular}{c|c|c|c} student & over 21 & hair color & male/female \\ \hline Bob & no & brown & male \\ Sarah & no & black & female \\ Jean & yes & brown & female \end{tabular} \quad \begin{tabular}{c|c|c|c} student & new worth & height & distance home \\ \hline Kristin & \$2,000 & 5'5'' & 200 mi \\ Jordan & \$1,000,000 & 6'0'' & 15 mi \\ Brad & \$12 & 5'7'' & 2,012 mi \end{tabular} \end{center} Note the difference in the types of variables in the table on the left from the types of variables in the table on the right. The variables in the table in the right can be numbers. We can add them or multiply them or divide or whatever. They are called \textbf{quantitative variables}. The variables in the table on the left aren't really numbers. We can't add them up or multiply them. We call these types of variables \textbf{categorical variables}. One important point: note that the ``over 21'' variable is categorical. Some folks mistakenly consider this a quantitative variable because it \textit{involves} a number. It is, in fact, a categorical variable. We talked a little about how we could expect some of the variables to be distributed. I said that I figured that a height variable would look something like this: <>= x = rnorm(1000, mean=5.6, sd=.2) hist(x, main="Maybe what a random sample of 1000 peoples' height would look like", xlab='feet') @ And then I said maybe the ``distance home'' variable would look a little different. Maybe most people would live close by and less farther away but no one could live a negative distance from campus. Maybe the distribution would look like this: <>= y = rgamma(1000, shape=2, rate=.009) hist(y, main="Maybe what a random sample of 1000 peoples' distance to home would look like", xlab='miles') @ Just keep in mind that certain variables will be \textit{distributed} in certain ways.