EcoInfo_Manuscript

Jessica Couture, Rachael E. Blake, Matthew B. Jones, Gavin McDonald, Colette Ward

* NOTE that authors after the first author are currently in alphabetical order. This can be changed following further discussion

WORK IN PROGRESS. DO NOT CITE.

Abstract


This is where the abstract goes.

Introduction

In order to enhance discovery, efficiency and transparency in their work, scientists have begun developing a revolutionary approach to their disciplines known broadly as open science. This fledgling movement prioritizes the comprehensive, public sharing of scientific work, from initial brainstorming through final publication. As the theory of open science gains popularity within scientific institutions around the globe, many researchers are discovering difficulties in the application of this exciting new trend (citation). Though, in this quest for full disclosure, a number of tools have been designed to facilitate openness at each step of the scientific process, those deeply involved in this development acknowledge that the movement is still in its adolescence (citation). 
 
Indispensable as a foundation of scientific inquiry, accessibility to data is an essential element within a more transparent science that faces both conceptual and practical obstacles. An historic culture of perceived ownership over one’s data, combined with an environment of competition for funding and access to publication space serves as a cultural impediment to the adoption of data sharing practices (citation).  Efforts to defend their ideas from theft, and secure publication opportunities often involve withholding data from the larger scientific community Many of our commonly used data tools neglect to prepare data for publication or sharing and therefore require additional training, effort and resources to complete this extra step. While tools have been developed to help scientists responsibly archive their data in a manner that will be accessible and preserved through time these tools and an understanding of their appropriate use are still sparsely available to scientists.
 
As tools become more available to facilitate data documentation and sharing one would expect an increase in data availability over time. Larger and faster servers, more accessible tools such as Github, the cloud and free online repositories, which have been popularized in the past decade, should lead to more sharing than previous decades. On a more personal level, William Michener hypothesizes a temporal degradation in the knowledge of one’s own data. He concludes that the older data are the less information exists about the data and associated metadata both within the collector’s memory and the physical form of the data (1997). Given this conclusion it is realistic to assume that older data will inherently be less accessible than newer data even to those most in intimately associated with it. Furthermore, increasing support for the open science movement would suggest an improved willingness to share data and underscore the hypothesis that more data should be available in recent years than in earlier history.
 
A number of studies have tested the availability of data under open publication requirements. These studies have focused on relatively small sample sizes (n=37, 141, 10, 50; respectively) and outreach efforts and have consistently recovered less than 30% of the data requested (Wolins 1962, Wicherts et al. 2006, Savage & Vickers 2009, Alsheikh-Ali et al. 2011). Would a larger and more current effort return a higher percentage of data than previous studies?
 
In addition to variation over time, one might also assume there would be differences in data sharing between research fields. Due to differences in types of data, content, collection or analysis methods there may be trends in data openness based on this characterization. For example, physical data is collected by instruments and can be downloaded automatically and undergoes pre-defined analysis protocols. Scientists producing this sort of data may be more open to sharing since it should take little additional work and external interpretation should be clear. Alternatively, confidential or proprietary data, in the social sciences for example, could cause those in such fields to sequester original data.
 
Similar to expected trends in research field, agency affiliation may influence one’s willingness or ability to share. If a scientist collects data under a public entity they may be more compelled, or internally required, to make their data available to the public. In contrast, private consulting firms may prefer to keep data in house in order to protect clients or increase its profitability.  Many public entities have internal data sharing policies since they are publicly funded. Thus, these sectors, such as government agencies, could have two requirements to publicly share data: their home agency and an external funding requirement, and perhaps internal systems of data sharing in place. Therefore, we would expect government groups to provide data more often than private organizations.
 
To test these trends, we focused our study on the data-collection efforts of the Exxon Valdez Oil Spill Trustee Council (EVOSTC).  The EVOSTC was formed following the Exxon Valdez oil spill in the Gulf of Alaska in 1989, and has funded hundreds of projects since its inception.  The EVOSTC has required the publication of data within one year of data collection for all recipients of their grants, but do not specify archiving methods nor provide a specific publication platform. In 2010 the EVOSTC funded the Gulf Watch Alaska group to create an open archive of all of the data collected under their grants, previous and on-going. Within this group, the National Center for Ecological Analysis and Synthesis was tasked with archiving all of the historic data (1989-2010) funded by the EVOSTC. These grants funded an array of government entities, private consulting firms and non-governmental organizations, as well as a few Alaskan native groups. The diversity of the grantees was also compounded by the variety of scientific disciplines under which they operated.  Within such a broad field of data-collection, we wanted to know for how many of these projects we could acquire data, if there were trends in data reporting based on data or grantee characteristics and, if data were not procured, why we were unsuccessful. The EVOSTC did make an effort to collect these data in the mid 1990s but the success of this effort is unknown as the content of this collection has since been lost.