Jessica Couture, Rachael E. Blake, Matthew B. Jones, Gavin McDonald, Colette Ward

* NOTE that authors after the first author are currently in alphabetical order. This can be changed following further discussion



This is where the abstract goes.


In order to enhance discovery, efficiency and transparency in their work, scientists have begun developing a revolutionary approach to their disciplines known broadly as open science. This fledgling movement prioritizes the comprehensive, public sharing of scientific work, from initial brainstorming through final publication. As the theory of open science gains popularity within scientific institutions around the globe, many researchers are discovering difficulties in the application of this exciting new trend (citation). Though, in this quest for full disclosure, a number of tools have been designed to facilitate openness at each step of the scientific process, those deeply involved in this development acknowledge that the movement is still in its adolescence (citation). 
Indispensable as a foundation of scientific inquiry, accessibility to data is an essential element within a more transparent science that faces both conceptual and practical obstacles. An historic culture of perceived ownership over one’s data, combined with an environment of competition for funding and access to publication space serves as a cultural impediment to the adoption of data sharing practices (citation).  Efforts to defend their ideas from theft, and secure publication opportunities often involve withholding data from the larger scientific community Many of our commonly used data tools neglect to prepare data for publication or sharing and therefore require additional training, effort and resources to complete this extra step. While tools have been developed to help scientists responsibly archive their data in a manner that will be accessible and preserved through time these tools and an understanding of their appropriate use are still sparsely available to scientists.
As tools become more available to facilitate data documentation and sharing one would expect an increase in data availability over time. Larger and faster servers, more accessible tools such as Github, the cloud and free online repositories, which have been popularized in the past decade, should lead to more sharing than previous decades. On a more personal level, William Michener hypothesizes a temporal degradation in the knowledge of one’s own data. He concludes that the older data are the less information exists about the data and associated metadata both within the collector’s memory and the physical form of the data (1997). Given this conclusion it is realistic to assume that older data will inherently be less accessible than newer data even to those most in intimately associated with it. Furthermore, increasing support for the open science movement would suggest an improved willingness to share data and underscore the hypothesis that more data should be available in recent years than in earlier history.
A number of studies have tested the availability of data under open publication requirements. These studies have focused on relatively small sample sizes (n=37, 141, 10, 50; respectively) and outreach efforts and have consistently recovered less than 30% of the data requested (Wolins 1962, Wicherts et al. 2006, Savage & Vickers 2009, Alsheikh-Ali et al. 2011). Would a larger and more current effort return a higher percentage of data than previous studies?
In addition to variation over time, one might also assume there would be differences in data sharing between research fields. Due to differences in types of data, content, collection or analysis methods there may be trends in data openness based on this characterization. For example, physical data is collected by instruments and can be downloaded automatically and undergoes pre-defined analysis protocols. Scientists producing this sort of data may be more open to sharing since it should take little additional work and external interpretation should be clear. Alternatively, confidential or proprietary data, in the social sciences for example, could cause those in such fields to sequester original data.
Similar to expected trends in research field, agency affiliation may influence one’s willingness or ability to share. If a scientist collects data under a public entity they may be more compelled, or internally required, to make their data available to the public. In contrast, private consulting firms may prefer to keep data in house in order to protect clients or increase its profitability.  Many public entities have internal data sharing policies since they are publicly funded. Thus, these sectors, such as government agencies, could have two requirements to publicly share data: their home agency and an external funding requirement, and perhaps internal systems of data sharing in place. Therefore, we would expect government groups to provide data more often than private organizations.
To test these trends, we focused our study on the data-collection efforts of the Exxon Valdez Oil Spill Trustee Council (EVOSTC).  The EVOSTC was formed following the Exxon Valdez oil spill in the Gulf of Alaska in 1989, and has funded hundreds of projects since its inception.  The EVOSTC has required the publication of data within one year of data collection for all recipients of their grants, but do not specify archiving methods nor provide a specific publication platform. In 2010 the EVOSTC funded the Gulf Watch Alaska group to create an open archive of all of the data collected under their grants, previous and on-going. Within this group, the National Center for Ecological Analysis and Synthesis was tasked with archiving all of the historic data (1989-2010) funded by the EVOSTC. These grants funded an array of government entities, private consulting firms and non-governmental organizations, as well as a few Alaskan native groups. The diversity of the grantees was also compounded by the variety of scientific disciplines under which they operated.  Within such a broad field of data-collection, we wanted to know for how many of these projects we could acquire data, if there were trends in data reporting based on data or grantee characteristics and, if data were not procured, why we were unsuccessful. The EVOSTC did make an effort to collect these data in the mid 1990s but the success of this effort is unknown as the content of this collection has since been lost. 


In order to assess our success, we asked: for how many EVOSTC funded projects can we recover data? Of these projects we were interested in trends in reporting. Particularly, we asked which grantees were more likely to provide data and if there was a temporal trend as predicted by Michener (1997). To test this we asked if there are differences in data reporting based on any of three project characteristics: 1) data field, 2) grantee's agency sector, and 3) age of data. Of the data we are unable to recover, we wanted to know why were data not shared. Since each funded project had an unknown number of “datasets”, we base success on the publication of at least one dataset for any given project, regardless of size or complexity. Throughout our extensive data recovery effort, we took careful notes of our outreach efforts, communications and progress in publicly archiving acquired data for use in statistical analyses.
Data recovery and archiving
From 2012 to 2014, a team of one full-time and three part-time staff members was assigned to collect and archive data funded by the Exxon Valdez Oil Spill Trustee Council (EVOSTC) targeting specifically those projects funded between 1989 and 2010. Project information was obtained from the projects page on the EVOSTC website, which includes varying levels of detail for each project, ranging from project title only to full bibliographic information and attached reports. We tracked the progress of the data request and acquisition process for each project based on six stages: “emailed”, “replied”, “sent data”, “published”, “unrecoverable”.
Contact information was obtained through agency sites and Google searches based on the information we were able to gather from the EVOSTC site. If we were able to find contact information for the listed principal investigator, an initial outreach email or phone call was made explaining the data recovery project, citing the data requirements, and requesting data for the project in question. Projects for which outreach could be made were labeled “emailed” and were followed up numerous times if no reply was received. A reply to the outreach and would confirm that the contact information was correct, the project label would be promoted to “replied”, regardless of level of cooperation expressed in the response. If the responder determined the data was unrecoverable, they were labeled as such and the reasons were recorded in our tracking system.
Once we received data from for a project, the label wa