deletions | additions
diff --git a/sectionsc_Conclusion.tex b/sectionsc_Conclusion.tex
index 82f50b4..063f7bd 100644
--- a/sectionsc_Conclusion.tex
+++ b/sectionsc_Conclusion.tex
...
Perhaps the biggest takeaway of working in this project has been understanding the limitations of different quantitative methods and the limitations of using “big data” at large.
In reference to using “big data,” we are reminded of a common theme in academia and workplaces. As John King states, “numbers beat no numbers
everytime.” every time.” In effect, we are constantly told to produce “proof” or numbers to support claims- whether they be related to work performance, or phenomenon in research, or anything related to an external claim that
has to do with trends. require some validity. What is often
no not discussed however is
how how, whoever collects the data, frames what is deemed as “important.” In essence, each researcher becomes his own infrastructure dispensing a particular viewpoint on the world that is validated by the medium of statistics. The numbers become a tool in justifying the viewpoint because now there is “proof” to validate his theory, and statistical
theory concepts to verify
even that the data is legitimate.
Particularly when using Google Trends, it becomes apparent that the way that information is categorized creates value assessments that can ignore entire populations or groups of data. Not all search engine users are privy to natural language queries, so information provided by Google Trends that is wholly dependent on a collection of search terms to reveal trends in data, does not reveal information about
those users who might
as ask the same
question question, but with different terms or in a different language all together. There is also
an the issue of some information
begin being designed so that it won’t be accessible by Google or other search engines, which
could is sure to have an effect on the information acquired through Google Trends. Additionally, Google is a corporate entity, with a proprietary algorithm for obtaining its data, so there is no way to know how their algorithms manipulate the findings that become Google Trends data. What is known is that Google normalizes its data, and in so doing, refines and manipulates what is revealed in their data sets.
Additionally, in beginning to use statistical methods to work with big data, it becomes apparent that statistical methods primarily function in revealing patterns, but do nothing to tell why those patterns emerge. There is also a black box to certain types of data (like financial records) which might be important to understand and study but are very difficult to obtain because of privacy concerns. In effect, there is data that is accessible, but not necessarily desirable, and vice-versa, and as individuals (and researchers) we are constantly straddling the fence between what we deem
is as relevant and
important (and in important. In the process are creating our own value
assessments) assessments about what is necessary for observation and investigation.
For future considerations, when designing applications like Johnny Takes on Stats in Informatics, designers must be aware of the implications of using big data sets and
how they purvey that knowledge to users (i.e. future researchers) who are learning statistical methods. The goal is enable awareness of the limitations of the data to enable a better understanding of what claims can be made and how the data can be interpreted accurately.