Authorea

Dahlia Hegab edited sectionsc_Conclusion.tex almost 11 years ago

Commit id: 9df6dc9ada28035728b0b2f46824e27160981cee

deletions | additions

Perhaps the biggest takeaway of working in this project has been understanding the limitations of different quantitative methods and the limitations of using “big data” at large. In reference to using “big data,” we are reminded of a common theme in academia and workplaces. As John King states, “numbers beat no numbers everytime.” every time.” In effect, we are constantly told to produce “proof” or numbers to support claims- whether they be related to work performance, or phenomenon in research, or anything related to an external claim that has to do with trends. require some validity. What is often no not discussed however is how how, whoever collects the data, frames what is deemed as “important.” In essence, each researcher becomes his own infrastructure dispensing a particular viewpoint on the world that is validated by the medium of statistics. The numbers become a tool in justifying the viewpoint because now there is “proof” to validate his theory, and statistical theory concepts to verifyeven that the data is legitimate. Particularly when using Google Trends, it becomes apparent that the way that information is categorized creates value assessments that can ignore entire populations or groups of data. Not all search engine users are privy to natural language queries, so information provided by Google Trends that is wholly dependent on a collection of search terms to reveal trends in data, does not reveal information aboutthose users who might as ask the same question question, but with different terms or in a different language all together. There is also an the issue of some information begin being designed so that it won’t be accessible by Google or other search engines, which could is sure to have an effect on the information acquired through Google Trends. Additionally, Google is a corporate entity, with a proprietary algorithm for obtaining its data, so there is no way to know how their algorithms manipulate the findings that become Google Trends data. What is known is that Google normalizes its data, and in so doing, refines and manipulates what is revealed in their data sets. Additionally, in beginning to use statistical methods to work with big data, it becomes apparent that statistical methods primarily function in revealing patterns, but do nothing to tell why those patterns emerge. There is also a black box to certain types of data (like financial records) which might be important to understand and study but are very difficult to obtain because of privacy concerns. In effect, there is data that is accessible, but not necessarily desirable, and vice-versa, and as individuals (and researchers) we are constantly straddling the fence between what we deem is as relevant and important (and in important. In the process are creating our own value assessments) assessments about what is necessary for observation and investigation. For future considerations, when designing applications like Johnny Takes on Stats in Informatics, designers must be aware of the implications of using big data sets and how they purvey that knowledge to users (i.e. future researchers) who are learning statistical methods. The goal is enable awareness of the limitations of the data to enable a better understanding of what claims can be made and how the data can be interpreted accurately.