Authorea

Vijay Krishna Palepu renamed sectionsc_Conclusion.tex to Conclusion.tex almost 11 years ago

Commit id: 7552e761bfbe798a80af17372239bfda03b64973

deletions | additions

\section{\sc Conclusion} \subsection{Future Work} With the use of D3 we are able to manipulate the data at the user’s interest. One of the original goals was to incorporate the CrossFilter JavaScript API. By incorporating this, we would allow the user to manipulate the two data dataset with more accuracy because we can calculate different dimensions of the data. In the current dataset we only allow the user to group data by their choice, which currently is a time range. With CrossFilter, we could’ve selected data sets for a certain value range and give more dimensions for the user to explore. \subsection{What did we learn this term?} Perhaps the biggest takeaway of working in this project has been understanding the limitations of different quantitative methods and the limitations of using “big data” at large. In reference to using “big data,” we are reminded of a common theme in academia and workplaces. As John King states, “numbers beat no numbers every time.” In effect, we are constantly told to produce “proof” or numbers to support claims- whether they be related to work performance, or phenomenon in research, or anything related to an external claim that require some validity. What is often not discussed however is how, whoever collects the data, frames what is deemed as “important.” In essence, each researcher becomes his own infrastructure dispensing a particular viewpoint on the world that is validated by the medium of statistics. The numbers become a tool in justifying the viewpoint because now there is “proof” to validate his theory, and statistical concepts to verify that the data is legitimate. Particularly when using Google Trends, it becomes apparent that the way that information is categorized creates value assessments that can ignore entire populations or groups of data. Not all search engine users are privy to natural language queries, so information provided by Google Trends that is wholly dependent on a collection of search terms to reveal trends in data, does not reveal information about users who might ask the same question, but with different terms or in a different language all together. There is also the issue of some information being designed so that it won’t be accessible by Google or other search engines, which is sure to have an effect on the information acquired through Google Trends. Additionally, Google is a corporate entity, with a proprietary algorithm for obtaining its data, so there is no way to know how their algorithms manipulate the findings that become Google Trends data. What is known is that Google normalizes its data, and in so doing, refines and manipulates what is revealed in their data sets. Additionally, in beginning to use statistical methods to work with big data, it becomes apparent that statistical methods primarily function in revealing patterns, but do nothing to tell why those patterns emerge. There is also a black box to certain types of data (like financial records) which might be important to understand and study but are very difficult to obtain because of privacy concerns. In effect, there is data that is accessible, but not necessarily desirable, and vice-versa, and as individuals (and researchers) we are constantly straddling the fence between what we deem as relevant and important. In the process are creating our own value assessments about what is necessary for observation and investigation. For future considerations, when designing applications like Johnny Takes on Stats in Informatics, designers must be aware of the implications of using big data sets and how they purvey that knowledge to users (i.e. future researchers) who are learning statistical methods. The goal is enable awareness of the limitations of the data to enable a better understanding of what claims can be made and how the data can be interpreted accurately.