In order to enhance discovery, efficiency, and
transparency in their work, many scientists have been promoting a revolutionary
approach known broadly as open science. Within the broader open science theory
there is a strong focus on data sharing, highlighting that, when made publicly
available, data can have a much higher impact than data that is limited to the
creator’s analysis (Mascarelli 2009, Gewen 2013, Obama 2013, Hampton et al.
2015). Although this movement has been gaining steam and a number of tools have
been designed to facilitate openness at each step of the scientific process,
those deeply involved in the development of open science acknowledge that the
movement is still in its adolescence (Bjork 2004, Ram 2013, Hampton et al.
2015). Many researchers are encountering both technical and cultural
difficulties in the application of this exciting new technique (Costello et al.
2013, Van Noorden 2013, Hampton et al. 2015).
In contrast to this new and exciting trend, many funders
and publishers have long required data sharing or publication of their awardees
and accepted submissions. Despite these established policies, weak enforcement
has led to consistently low compliance (Wolins 1962, Wicherts et al. 2006,
Savage & Vickers 2009, Alsheikh-Ali et al. 2011, Vines et al. 2014). The
technical hurdles to sharing data have historically been many, and while
digitization should help overcome these obstacles, many funders and publishers
still fail to provide infrastructure and technical support for required data
publication. In addition, an historic culture of perceived ownership of one’s
data, combined with an environment of competition for funding and access to
publication space serves as a cultural impediment to the adoption of open data
sharing practices (Hampton et al. 2015, Sieber 1989). In an effort to
defend intellectual novelty and secure publication opportunities, scientists
often withhold data from the larger scientific community.
Although data requirements have been treated with similar
leniency by both funders and publishers, the nature of these preconditions vary
in that the funder shares ownership of the collected data, while the publisher
is merely a platform and ownership of the publication is what could be disputed
rather than the data itself. Therefore researchers could feel more compelled to
comply with a funder’s requirements.
Alternatively, since publications are the currency of science, the
converse might occur. A number of studies have evaluated the rate of
journal-specific data reporting, but none have focused on funder success. There
may be differences in the effectiveness of data sharing policies between these
institutions. To test whether funding requirements result in different rates of
data sharing than journal requirements, we wanted to determine if the recovery
rate for a specific funding agency would differ from the results of journal specific data salvage efforts.
In addition to differences in data sharing based on
who establishes the requirements, differences in data sharing may occur based
on other characteristics such as the age of data (number of years since data
were produced), research field or agency sector of the data collector. An increasing
support for the open science movement would suggest an improved willingness to
share data and underscore the hypothesis that more data should be available in
recent years than in earlier history. William Michener et al. hypothesize a
temporal degradation in the knowledge of one’s own data, concluding that the
older data are the less information exists about the data and associated
metadata both within the collector’s memory and the physical form of the data.
Recent studies have supported this hypothesis (Michener 1997, Vines et al. 2014, Baker 2017), which is
further compounded by an increased availability of data documentation and
sharing tools. Larger and faster servers, more data sharing tools such as
Github, cloud services and free online repositories, all of which have been
popularized in the past decade, should lead to more data sharing overall than
in previous decades (Hampton et al. 2015, Ram 2013, Reichman et al. 2011).
Similarly, differences in data collection protocols, innovations in
instrumentation, or confidentiality of information between research fields may
lead to better data preservation in some disciplines. For example, data
collected automatically or using well established protocols maybe be more
easily shared than data requiring more complex processing or confidential data
such as that involving human subjects. Research fields with more of these
latter types of data many experience hurdles to sharing that could inhibit data
sharing and preservation. Furthermore, a scientist’s agency affiliation may
influence willingness or ability to share data. A private consulting agency may
prefer to keep data private in order to protect client confidentiality or
increase profitability. In contrast, if
a scientist collects data under a public agency, their department may compel or
even require data publication (Obama 2013). Many public government agencies,
have both external and internal data sharing policies, and are more likely to
provide established protocols and systems of data sharing for their employees.
we assess the ability to retroactively archive ecological and environmental
data and evaluate patterns in recovery of a funding body. To test these trends,
we focused our study on the data-collection efforts of the Exxon Valdez Oil
Spill Trustee Council (EVOSTC). The EVOSTC, instituted in 1989 to manage
monetary damages by Exxon Mobile following the Exxon Valdez oil spill in the
Gulf of Alaska in 1989, has funded hundreds of projects since its inception.
The EVOSTC requires the publication of data within one year of data collection
for all recipients of their grants, but do not specify archiving methods nor
provide a specific publication platform. The EVOSTC did make an effort to
collect these data in the mid 1990s but the success of this effort is unknown
as the content of this collection has since been lost. EVOSTC grants have funded an array of
government entities, private consulting firms and non-governmental
organizations, as well as a few Alaskan native groups. The diversity of the
grantees was also compounded by the variety of scientific disciplines under
which they operated. We wanted to know 1) for how many of these projects
we could recover data? 2) Were there trends in data reporting based on data or
grantee characteristics? 3) If data were not procured, why we were