deletions | additions
diff --git a/Results_exploratory_analysis.tex b/Results_exploratory_analysis.tex
index 0586fb9..c6a5790 100644
--- a/Results_exploratory_analysis.tex
+++ b/Results_exploratory_analysis.tex
...
astronomers use links within articles to point to datasets and related
supplemental data resources.
We analyze a corpus of all articles published in the four main
American
astronomy journals (The Astrophysical Journal, The Astrophysical Journal
Letters, The Astrophysical Journal Supplement, The Astronomical Journal) between
1997 and
2008. We find a total of
$13447$ $13,447$ potential links to datasets in a
total of
$7641$ $7,641$ publications. The detailed procedure by which
potential data links are selected and filtered is
described in the Materials and Methods section.
In the barplot of Figure \ref{fig:fig1} we show how linking
practices have changed over time. Links to potential data resources in astronomy first appear in
1997, with only a couple of dozens links published in that year, and
the number
quickly increases
every each year to
reach around
$1500$ yearly $1,500$ links in
2005. After 2005, the volume of total published links roughly stays
the same every year. The graph shows that with widespread use and
adoption of the
WWW, linking Web, showing links to online resources within published
articles
has become becoming more and more popular. The bars in the barplot of Figure
\ref{fig:fig1} also depict whether published links
are were still
available as of December 2011: the green portion of each bar represents
the volume of valid links (HTTP status code 200: OK), while the grey
portion of the bars represents broken links (HTTP status codes 3xx,
4xx, and 5xx). This link categorization shows that half or more of all
links published prior to 2001
are now broken. were broken by 2011. The percentage of broken
links decreases with
time to reach time, reaching roughly 10\% in 2008: one in ten links
included in astronomy papers in 2008 is unreachable three years later.
This analysis can be pushed further by exploring two distinct subsets
of the astronomy link corpus. In Figure \ref{fig:fig2} we show how
the percentages of broken links differ over time for a set of
$1801$ $1,801$ links to personal
websites
(links (approximated as links which contain the tilde symbol \~ , which
are usually reserved for personal web pages on institutional servers)
and a set of
$3731$ $3,731$ links to institutional, curated archives (a manually
selected list of domains that are obvious astronomy archives, such as
\url{archive.stsci.edu}).
Attempting to make a distinction between
these two categories of links is of crucial importance. The former set
of links, the ``tilde links'', are potential pointers to datasets
found on personal websites. These may consist of data tables and
images which are the product of data analysis and reduction procedures
described in the accompanying paper.
As such, they do not belong to larger curated archives, which
normally typically host raw data only. Ideally, these datasets would be included
in the full text of the article, but oftentimes they are too large to
fit within the format of a published paper and are included on a
personal server and linked from within the paper. The latter set of
...
data hosted on astronomers' personal websites, become unreachable much
faster than links to curated ``institutional datasets''.
These findings point to a preliminary realization:
that astronomers
do
have a appreciate, but cannot reliably meet, the
need to reference and include data materials in their
published
work. work in order to preserve its value. Since they lack a standardized mechanism to
reference these resources --- data citations do not normally fit in the
format, structure, and scope of published journal articles --- they
attempt to cite datasets using simple linking from within
articles. Results from this preliminary analysis prompted a