Authorea

Vijay Krishna Palepu deleted file sectionsc_The_Discus.tex almost 11 years ago

Commit id: d96c4cb5bd97c8f5fde6df2a46f7e43d040ebb57

deletions | additions

\section{\sc The ``Discussion and Feedback'' Component} As discussed in one of the design decisions for this application, we recognized the importance of describing to the user \item the implications of choosing certain inputs for the tests; \item the implications of choosing the data for the tests; \item the meaning of the obtained results; \item the background behind the experiment itself. In order to convey this information effectively we decided to implement a separate component in the application whose focus would be to solely provide this feedback and discussion points to the user of the application. Additionally, this component would also serve as a general notification system, that could be used for general application related feedback such as, warning or error messages in case something goes wrong, either with the application code or the input that the application receives from that user. While not all such feedback and discussion points are included in the prototype of the application that we created, we have tried to include many of those points here in this report. These points are to follow in this section. These and similar other points of discussion can be iteratively added to the application based on additional feedback that the users might want, outside of what is included in this report. Many articles mention what a researcher is testing to be the cause of a particular phenomenon (a researcher is testing what is known as the alternate hypothesis (H1). They fail to mention the null hypothesis (H0), which is a standard fixture in statistical literature. The null hypothesis is used in frequentist methods and is what a researcher tries to reject or fail to reject (not disprove). Before you run any statistical test, you must first determine your alpha level, which is also called the “significance level.” By definition, the alpha level is the probability of rejecting the null hypothesis when the null hypothesis is true. In other words, it’s the probability of making a wrong decision. Most folks typically use an alpha level of 0.05. However, if you’re analyzing airplane engine failures, you may want to lower the probability of making a wrong decision and use a smaller alpha. On the other hand, if you're making paper airplanes, you might be willing to increase alpha and accept the higher risk of making the wrong decision. Like all probabilities, alpha ranges from 0 to 1. The standard alpha level is usually set of .05. Assuming that the null hypothesis is true, this means we may reject the null hypothesis only if the observed data are so unusual that they would have occurred by chance at most 5 \% of the time. And since the p-value is a probability just like alpha, p-values also range from 0 to 1. For any statistical test, the probability of making a Type I error is denoted by the Greek letter alpha ($\alpha$). Type I errors occur when we reject a null hypothesis that is actually true. Thus, in the long run, for a test with level of significance of 0.05 = 1/20, a true null hypothesis will be rejected one out of every 20 times. The confidence interval is the range of likely values for a population parameter, such as the population mean. For example, if you compute a 95\% confidence interval for the average price of a Cairn terrier, then you can be 95\% confident that the interval contains the true average cost of all Cairn terriers. If alpha equals 0.05, then your confidence level is 0.95. If you increase alpha, you both increase the probability of incorrectly rejecting the null hypothesis and also decrease your confidence level. The confidence interval and p-value will always lead you to the same conclusion. If the p-value is less than alpha (i.e., it is statistically significant), then the confidence interval will NOT contain the hypothesized mean. Thus, we know that the p-value will be less than 0.05. If the p-value is greater than alpha (i.e., it is not statistically significant), then the confidence interval will include the hypothesized mean. Alpha sets the standard for how extreme the data must be before we can reject the null hypothesis. The p-value indicates how extreme the data are. We compare the p-value with the alpha to determine whether the observed data are statistically significantly different from the null hypothesis: The end result of any statistical test (such as a t-test, Wilcoxon, ANOVA) is a p-value. The `p' stands for probability, and measures how likely it is that any observed difference between groups (such as the ``treatment'' group and the ``control'' group) is due to chance. In other words, the p-value is the probability of rejecting the null hypothesis (H0). Being a probability, p can take any value between 0 and 1. Values close to 0 indicate that the observed difference is unlikely (.05 of the industry standard) to be due to chance, where a p value close to 1 suggests there is no difference between groups other than that due to random variation. P value measures the strength of evidence against the null hypothesis. To test a hypothesis (which of two new informatics courses is more effective in teaching material), you need a single p-value (such as from a t test comparing the two courses), not two separate p-values addressing entirely different hypotheses (such as whether each course is better than a standard one given) don’t compare p-values. The aim of hypothesis testing is to evaluate how likely it is that the observed difference is true if the null hypothesis is true. If your p-value is lower than .05. you can reject the null hypothesis, thereby proving its more likely your alternate hypothesis is true. Fact 2: If the p-value is low, the null must go. If the p-value is less than alpha (normally .05) - the risk you are willing to take of making a wrong decision - then you reject the null hypothesis. For example, if the p-value was 0.02 and we're using an alpha of 0.05, we would reject the null hypothesis and conclude that the average price of Cairn terrier is NOT \$400. If the p-value is low, the null must go. If the p-value is greater than alpha (normally that is .05), then we fail to reject the null hypothesis. Or, to put it another way, if the p-value is high, the null will fly (out the window). Therefore, the phenomenon in question is not statistically significant. Statistical tests are applied to data to generate p-values to test hypotheses. The t test and the Wilcoxon test are two well known statistical tests- both tests are used to test differences between two groups (such as boys and girls OR a treatment and control) are compared with respect to a continuous variable that is not fixed (such as total self-esteem) and the independent variable is a categorical variable that is fixed (such as gender). The t-test can be used even if sample sizes are very small, as long as the variables within each group are normally distributed. With the t-test, the test statistic used to generate p values has to be used on normally distributed data with n-1 degrees of freedom. Wilcoxon tests are used however when the data is skewed, whereas t-test are used when the data is normally distributed). To run a Wilcoxon test, the data must be converted to ``ranks.'' The p-value is calculated by comparing differences in ranks to an expected distribution of differences in ranks. While the t-test is used to compare the means between two groups, ANOVA is a statistical procedure used to compare means between three or more groups. Analysis of variance (ANOVA), despite its name, is concerned with differences between means of groups, not differences between variances. Chi-square is a statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis. The chi-square test is always testing what scientists call the null hypothesis, which states that there is no significant difference between the expected and observed result. Variance and standard deviation are both measures of variation. They show how much diversity exists in a distribution. Both the variance and standard deviation increase or decrease based on how closely data clusters around the mean. The standard deviation is a measure of how spread out the numbers in a distribution are. It indicates how much, on average, each of the values in the distribution deviates from the center (or the mean) of the distribution. It is calculated by taking the square root of the variance. Variance is the average of the squared deviations from the mean. To calculate the variance, you first subtract the mean from each number and then square the results to find the squared differences. You then find the average of those squared differences. The outcome of the calculation the variance. Formal sample size calculation helps us think through what research can and cannot tell us. A small study has a good chance of failing to reject the null hypothesis, even if it is false. When planning a study, you have to work out the sample size you will need; and you have to figure out the aim of your study (whether its estimation or hypothesis testing). When we are hypothesis testing, the sample size you will need depends on statistical power. Statistical power is the probability that, if there is truly an effect of a particular sample size, you will reject the null hypothesis. If the aim of your study is estimation, then the sample size you will need depends on how precise you would like your estimate to be (precision is considered the width of your confidence interval). Precision is the width of the confidence interval. Its based on the inverse square law which states: i If you want to halve the width of your confidence interval, you have to quadruple your sample size. The formula for precision is variation divided by the square root of the sample size. There are formulas to work out sample size exactly. Have to be careful about the numbers put into sample size formulas for numbers to be sensible. The power of any test of statistical significance is the probability that it will reject a false null hypothesis. Power is usually fixed at 80% or 90%. \item A `random' and statistically sound sample can achieved by numerical selection. Serial (or numerical sampling) is one of the three most widely accepted methods which is not completely random, but can provide an acceptable base for most purposes. \item serial (or numerical) sampling is simple in concept, i.e. selecting every 3rd or 10th item, person, etc. according to categories assigned in the data or it can be based on far more complex criteria, e.g. picking a particular informatics course with a certain digit included the course number like 5 \item might have an issue if there is an overlap between the categories assigned in the data and your selection evaluation (i.e. every 3rd and 5th person) Guidelines: \item A serial sample may be acceptable for statistical study if the existing order of the whole body of the records is random (e.g. a series of returns filed in no systematic order). \item A serial sample is the only practicable method of sampling if the individual items cannot be separated the whole of it is considered one unit \item This method should not be used if there is an alphabetical, topographical or chronological arrangement to the records. Degrees of freedom in a data set is the number of values you can choose (also known as n-1, where n is the number of figures in the data set).