Johnny Takes on Stats for Informatics: Simple steps in Hypothesis Tests

Dahlia Hegab, Sonny Lin and Vijay Krishna Palepu

Abstract# Abstract

At many universities, graduates students, researchers, and faculty engage in research projects without having a keen grasp of statistical concepts and methods. We introduce a novel application titled Johnny Takes on Stats in Informatics, in an attempt to succinctly and clearly define relevant concepts in statistics that become crucial to conducting research in an accurate manner. We draw on storytelling narratives, such as those used in Rapunsel (www.rapunsel.org) and Alice 3D (www.alice.org), while integrating a JS application designed to walk researchers through the process of using on-line tools (here we use Google Trends) and data set files to analyze statistical information found on-line. We cite potential causes of concern in using data sets from Google Trends (and similar data set providers) and discuss future considerations for statistical learning applications such as Johnny Takes on Stats in Informatics.

The purpose of this narrative tutorial on statistical methods is to establish a simplified way for students to learn statistical concepts that are grounded in real-life examples. In addition to a narrative, we present an interface that allows students to interact with data sets that are accessible online. Additionally, we integrate this interface with information designed to clarify concepts in statistics, while demonstrating how to use those concepts in conjunction with research protocols and technologies like Google Trends (www.google.com/trends/).

The goal is to give students interested in doing research a fun way to quickly and clearly understand critical concepts in statistics, needed to continue to experiment with their own data sets (or online ones) so they can interpret future data sets they work with in meaningful and statistically sound ways. while allowing them the opportunity to walk through real-life example data sets that they can access and manipulate for future use. This tutorial is also designed to help students gain the foundation

Our application, titled “Johnny Takes on Stats for Informatics”, is an extension of earlier works that create storytelling modules to teach complex subject material. Using a simplified methodology of storytelling instructional guides, inspired by instructional applications like Alice 3D and websites like killmath.com and http://vectors.usc.edu, we create a story-based online application to teach to statistics concepts that are considered challenging to individuals first learning statistical methods in a research setting.

Our application begins with a narrative detailing our protagonist, Johnny. Johnny is an Informatics student, new to research and statistical methods, who must interpret data findings from a research project he is working on. After briefly introducing him, we discuss his research question, clarifying that every research study must detail a clear research question that contributes to scholarly knowledge. This happens to be: “people who update their Twitter accounts when using mobile devices update them more than people who update their Twitter statuses using computers and laptops.”

Relying on Schuyler Huck’s Reading Statistics and Research, we explain how this research question is actually an example of an alternative hypothesis (this is the hypothesis the researcher is testing to be the cause of a particular phenomenon). Although many conference and journal publications base their studies on trying to reject (or fail-to-reject) a hypothesis, in the statistical world, what becomes more relevant to focus on is the null hypothesis (as rejection of it will then show that it is more likely the alternative hypothesis is correct). We give an example of what the null hypothesis would be in this scenario, but can summarize it as the opposite viewpoint of the alternative hypothesis. Therefore, in this instance, the null hypothesis is would be: “people who update their Twitter accounts when using mobile devices do not update their Twitter statuses more than people who update their Twitter statuses using computers and laptops.” In frequentist methods, researchers try to reject the null hypothesis, while obtaining a p-value less than .05 (more on this later) in order present a statistically significant result.

We explain the significance of using certain terminology when framing a research question, noting how emphasis should be placed on the influence of variable(s) instead of wording which tries to establish direct causation. Since many factors can affect the scenario in question, we go on to explain ways to isolate and test the variable of interest so a convincing argument can be made from our findings in the data.

We then outline how Johnny should run his study. The goal is to demonstrate a framework that shows how to collect data in a way that is verifiable so statistical findings can be interpreted easily from it. In order to isolate the variable(s) of interest, we suggest Johnny create two groups of participants to be evaluated: a control group and an “altered” group (this is the only group which contains the variable of interest). Here the variable of interest is updating Twitter statuses through mobile devices. Working off of Schuyler Huck’s Reading Statistics and Research, we create a univariate study, where there is only one variable of interest present for evaluation. In future works and iterations however, the applicatio