this is for holding javascript data
Alberto Pepe added materials_amd_methods.tex
over 11 years ago
Commit id: 5691afb9e6279b8df1f9dc4f4ba2b0b9fc43e521
deletions | additions
diff --git a/materials_amd_methods.tex b/materials_amd_methods.tex
new file mode 100644
index 0000000..1c7c193
--- /dev/null
+++ b/materials_amd_methods.tex
...
\subsection{Materials and Methods}
We analyze a corpus of all articles published between 1997 and 2008 in the four main
astronomy journals (The Astrophysical Journal, The Astrophysical Journal
Letters, Astronomy \& Astrophysics, The Astronomical Journal) which
contain external URL links in their full text. We initially find $33847$ external links
in $13390$ articles.
In order to isolate potential links to datasets from this list, we
perform the following filtering workflow. First, we remove links to
domains that are scholarly repositories and which obviously do not
host data (or which did not host data prior to 2008). These include
domains such as \url{dx.doi.org}, \url{arxiv.org}, \url{xxx.lanl.gov},
and \url{adsabs.harvard.edu}. Removing links to these domains, which
are obviously pointers to articles, narrows down the corpus to
$26663$.
Second, we remove all links which are found in the reference list of
an article. While it is entirely possible that authors cite datasets in the
same way as they cite bibliographic references, an exploratory analysis revealed that links
in the reference section of a paper were, by and large, pointers to articles, preprints,
star catalogs, circulars, manuals, and user guides. Therefore, we
remove these ``reference links'', bringing the corpus down to $20767$
links.
Third, based on the observation that links to datasets are generally
not found at the root of a website hierarchy, we removed links that
contain less than 2 forward slashes (other than the two slashes found in
the leading ``http://''). For example, the link to
\url{http://www.sdss.org} was dropped from the corpus (0 slashes),
while the link to
\url{http://www.cfa.harvard.edu/COMPLETE/data_html_pages/data.html}
was retained (3 slashes). This final filtering procedure reduces the
corpus to $13447$ links, which we consider potential links to datasets. Some descriptive statistics about this corpus
of links is presented in Table \ref{tab1}.