Materials and Methods

Link analysis

We analyzed a corpus of all articles published between 1997 and 2008 in the four main astronomy journals (The Astrophysical Journal, The Astrophysical Journal Letters, The Astrophysical Journal Supplement, The Astronomical Journal) which contain external URL links in their full text. We initially found \(33847\) external links in \(13390\) articles. http://hdl.handle.net/10904/10214 \cite{astrocite}

In order to isolate potential links to datasets from this list, we performed the following filtering workflow. First, we removed links to domains that are scholarly repositories and which obviously do not host data (or which did not host data prior to 2008). These include domains such as dx.doi.org, arxiv.org, xxx.lanl.gov, and adsabs.harvard.edu. Removing links to these domains, which are obviously pointers to articles, narrowed down the corpus to \(26663\).

Second, we removed all links which are found in the reference list of an article. While it is entirely possible that authors cite datasets in the same way as they cite bibliographic references, an exploratory analysis revealed that links in the reference section of a paper were, by and large, pointers to articles, preprints, star catalogs, circulars, manuals, and user guides. Therefore, we removed these “reference links”, bringing the corpus down to \(20767\) links.

Third, based on the observation that links to datasets are generally not found at the root of a website hierarchy, we removed links that contain less than 2 forward slashes (other than the two slashes found in the leading “http://”). For example, the link to http://www.sdss.org was dropped from the corpus (0 slashes), while the link to http://www.cfa.harvard.edu/COMPLETE/data_html_pages/data.html was retained (3 slashes). This final filtering procedure reduced the corpus to \(13447\) links, which we consider potential links to datasets. \cite{astrocite} Some descriptive statistics about this corpus of links is presented in Table 1.

Survey questions

\label{sec:surveyquestions}

  • Question 1. Have you ever used DATA you learned about from reading a Journal article? Check ALL that apply.

    • manually entered data from a table in a paper

    • manually extracted data point vaues from a graph

    • downloaded e-table of ASCII data provided by Journal

    • contacted author to ask for data & got what I needed

    • contacted author to ask for data & did NOT get what I needed

    • used online archive where data were available

  • Question 2. When it comes to sharing DATA you’ve created, collected or curated, you have? Check ALL that apply.

    • emailed data to a colleague upon request.

    • put data at an ftp-style site for a colleague to retrieve.

    • put data at a personal web site

    • put data at a project-based web site

    • put data at an organized institutional archive

    • not shared my data, because I think it will endanger my career.

    • not shared my data due to large file sizes

    • not shared my data because I don’t know how.

    • not shared my data because it takes too much effort.

    • not shared my data because I don’t think anyone will want it.