The “crisis of reproducibility” has been a significant source of controversy, heated debate, and calls for reform to institutional science in recent years. As a long-term solution to address both the present crisis and future obstacles, I propose the creation of a new form of research organization whose purpose would be to conduct random audits of the scientific literature. I suggest that data analytics of a digitized scientific corpus may play a critical role in allowing broadly educated scientists to identify linchpin results to investigate in further detail across all disciplines. I argue that a simple “mock” trial run of a simplified auditing firm consisting of several researchers over a short time period would provide valuable insight into the feasibility of this proposal.
[A longer version of this article is available via PubPub]
Can non-specialists with advanced scientific training identify key results worth replicating in a field that they have little to no experience with? If so, there would be profound consequences for the future of science. In particular, it would allow for the creation of a new form of research organization, which I term scientific auditing firms. Their primary responsibility would be to conduct random, systematically identified audits of the scientific literature. In addition to creating a disincentive for those who might otherwise engage in fraudulent practices, the existence of full-time, independent auditing firms would give academia and industry a greater sense of security in the reliability of the scientific corpus.
In such an organization, there would be a concentration of outstanding scientists exposed to the breadth of research produced by the entire scientific establishment. This fact would have significant secondary implications. For instance, auditing firms might also come to play the role of global monitors of scientific progress, issuing regular technical reports on contemporary developments, collaborating with filmmakers to develop documentaries of particular importance to the public, or offering technical consulting services for academia and industry.
Why is it essential that these results be identifiable by non-experts? The explosive growth of the scientific enterprise following the Second World War has paralleled a trend towards hyper-specialization. Consequently, a thorough understanding of a given research result almost always requires extensive training in a specific field. It would not be possible, therefore, for an organization to employ specialists from every field.
We have, however, a powerful set of tools that have only recently come into existence, namely, a digitized scientific corpus and the techniques of modern data science (Markowitz et al., 2015; Ding, 2011; Ding, 2011a; Ding et al., 2013; Zhu et al., 2013; Zhu et al., 2015; Song et al., 2014; Valverde et al., 2007; Gress, 2010; Solée et al., 2013). For analyzing the scientific corpus, relevant data science techniques include citation network analysis, natural language processing, and many other statistical methods developed for the processing of large data sets. Using these tools, it may be possible for an experienced scientist with strong quantitative skills to identify those experiments or results that merit further investigation and which lie outside of the scope of their scientific training. The identification of such results would constitute the first step of conducting a “scientific audit.”
Subsequent steps—which would require the participation of specialists from the field in question—might range from full-scale replication of an experiment, to the writing of a review article or set of tutorials on novel statistical techniques, to coordinating the investigation of a result with alternate methods via a network of collaborating laboratories.
The ultimate consequences of random scientific audits would be more than intellectual. Although it is difficult to quantify, the reproducibility crisis has come with a steep cost to science, industry, and society as a whole. The combination of poor or outright fraudulent research has resulted in significantly wasted financial resources, much of which has come from the tax-paying public. In addition to the cost of delayed scientific and technological development, there is now the additional cost of investigating and characterizing the severity of the problem itself. The recent analyses that have revealed large numbers of problematic studies were in a limited range of subjects and we can hardly claim to know what this distribution looks like for the entirety of science (Ioannidis, 2005; Campbell, 2015; Steven et al., 2007; Horton, 2015; Prinz et al., 2011; Alberts et al., 2014; Gunn, 2014; Adam et al., 2002; Check et al., 2005; Bouri et al., 2014).
As described above, the fundamental notion of a scientific auditing firm is quite simply stated. It would be a completely neutral organization, with no research objectives of its own, whose primary purpose would be to conduct random, systematically identified audits of the scientific literature. Nevertheless, the practicalities of how such an organization would operate, its relationship to the university system, and the network of relationships that would be required to conduct an audit are likely to be quite complex and involve many subtleties which we cannot currently anticipate.
Therefore, in order to evaluate the feasibility of establishing full-fledged, independent scientific auditing firms, I propose that we take an empirical stance and conduct a simple experiment to answer the question that motivated this article: Can non-specialists with advanced scientific training identify key results worth replicating in a field that they have little to no experience with? The experiment would consist of funding 1-2 researchers with broad scientific training and data science experience to conduct a “mock” trial run of an auditing firm. The goal would be to understand the challenges for non-experts to identify critical results to investigate in fields outside of their direct scientific training. For this initial experiment, we would not need to proceed with the auditing process itself. Simply understanding the challenges of the identification phase would be valuable.
There are many lessons we would learn from conducting such an experiment, ranging from the skills and experience that would be required of scientific auditors, to limitations on current data science toolkits for analyzing the scientific corpus, to the value of old-fashioned “investigative journalism” in the auditing process. We would also be forced to confront issues related to open-access of the scientific literature and whether partial availability of the research corpus in a given discipline is sufficient to reliably identify linchpin results.
The reproducibility crisis is a deeply troubling development that should motivate us to think critically and creatively about the future health of institutional science. In addition to the many reforms being proposed today, scientific auditing firms merit serious consideration as a long-term solution to ensure the reliability of published results. While there is much to be gained by discussing the practicalities and nuances of this proposal, there is a fundamental question which I have stated above that we can evaluate empirically. It would cost very little to conduct a “mock” trial run of a simplified auditing firm and the outcome of this experiment would inform whether further consideration of this idea is merited.
I would like to acknowledge Seshu Sarma and Caroline Schwenz for feedback on the manuscript.
David M Markowitz, Jeffrey T Hancock. Linguistic Obfuscation in Fraudulent Science. Journal of Language and Social Psychology 0261927X15614605 (2015).
Ying Ding. Applying weighted PageRank to author citation networks. Journal of the American Society for Information Science and Technology 62, 236–245 (2011).
Ying Ding. Topic-based PageRank on author cocitation networks. Journal of the American Society for Information Science and Technology 62, 449–466 (2011).
Ying Ding, Xiaozhong Liu, Chun Guo, Blaise Cronin. The distribution of references across texts: Some implications for citation analysis. Journal of Infometrics 7, 583–592 (2013).
Wenjia Zhu, Jiancheng Guan. A bibliometric study of service innovation research: based on complex network analysis. Scientometrics 94, 1195–1216 (2013).
Xiaodan Zhu, Peter Turney, Daniel Lemire, André Vellino. Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology 66, 408–427 (2015).
Min Song, SuYeon Kim, Guo Zhang, Ying Ding, Tamy Chambers. Productivity and influence in bioinformatics: A bibliometric analysis using PubMed central. Journal of the Association for Information Science and Technology 65, 352–371 (2014).
Sergi Valverde, Ricard V Solé, Mark A Bedau, Norman Packard. Topology and evolution of technology innovation networks. Physical Review E 76, 056118 (2007).
Bernard Gress. Properties of the USPTO patent citation network: 1963–2002. World Patent Information 32, 3–21 (2010).
Ricard V Solée, Sergi Valverde, Marti Rosas Casals, Stuart A Kauffman, Doyne Farmer, Niles Eldredge. The evolutionary ecology of technological innovations. Complexity 18, 15–27 (2013).
John P. A. Ioannidis. Why Most Published Research Findings Are False. PLoS Med 2, e124 (2005). Link
Philip Campbell. Challenges in Irreproducible Research. Nature 526 (2015). Link
Goodman Steven, Greenland Sander. Why Most Published Research Findings Are False: Problems in the Analysis. PLoS Med 4, e168 (2007). Link
Richard Horton. What’s medicine’s 5 sigma?. The Lancet 385 (2015). Link
Florian Prinz, Thomas Schlange, Khusru Asadullah. Believe it or not: how much can we rely on published data on potential drug targets?. Nature Reviews Drug Discovery 10 (2011). Link
Bruce Alberts, Marc W. Kirschner, Shirley Tilghman, Harold Varmus. Rescuing US biomedical research from its systemic flaws. Proceedings of the National Academy of Sciences 111, 5773-5777 (2014). Link
William Gunn. Reproducibility: fraud is not the big problem. Nature 505, 483–483 (2014).