this is for holding javascript data
Alberto Pepe generated LaTeX version of article
almost 11 years ago
Commit id: 97ff86064d263df3dd9f5615ac0dc47a8280f06e
deletions | additions
diff --git a/Dataverse interview paper for PLoS One.tex b/Dataverse interview paper for PLoS One.tex
new file mode 100644
index 0000000..15b001e
--- /dev/null
+++ b/Dataverse interview paper for PLoS One.tex
...
% Template for PLoS
% Version 1.0 January 2009
%
% To compile to pdf, run:
% latex plos.template
% bibtex plos.template
% latex plos.template
% latex plos.template
% dvipdf plos.template
\documentclass[10pt]{article}
% amsmath package, useful for mathematical formulas
\usepackage{amsmath}
% amssymb package, useful for mathematical symbols
\usepackage{amssymb}
% graphicx package, useful for including eps and pdf graphics
% include graphics with the command \includegraphics
\usepackage{graphicx}
% cite package, to clean up citations in the main text. Do not remove.
\usepackage{cite}
\usepackage{color}
\usepackage{url}
% Use doublespacing - comment out for single spacing
%\usepackage{setspace}
%\doublespacing
% Text layout
\topmargin 0.0cm
\oddsidemargin 0.5cm
\evensidemargin 0.5cm
\textwidth 16cm
\textheight 21cm
% Bold the 'Figure #' in the caption and separate it with a period
% Captions will be left justified
\usepackage[labelfont=bf,labelsep=period,justification=raggedright]{caption}
% Use the PLoS provided bibtex style
\bibliographystyle{plos2009}
% Remove brackets from numbering in List of References
\makeatletter
\renewcommand{\@biblabel}[1]{\quad#1.}
\makeatother
% Leave date blank
\date{}
\pagestyle{myheadings}
\begin{document}
\title{Handling, archiving, and citing data in astronomy
}
\author{Alberto Pepe, Alyssa Goodman, August Muench, Merce Crosas, Christopher Erdmann}
\maketitle
\section{Abstract}
We report the results of interviews with astronomers at the Harvard-Smithsonian Center for Astrophysics.
\cite{http://adsabs.harvard.edu/abs/2011ApJ...743..201P} \cite{http://adsabs.harvard.edu/abs/2009Natur.457...63G}
\begin{quote}
No, I don't have a website where I store these data. Most of it is in various stages of mess. ---An Astronomer
\end{quote}
\section{Introduction}
Astronomers produce and peruse vast amounts of scientific data. Making
these data publicly available is important to enable both reproducible
research and long term data curation and preservation (King, 1995, "Replication, Replication", Political Science and Politics, 28: 443-449). Because of
their sheer size, however, astronomical data are often left out
entirely from scientific publications and are thus hard to find and
obtain. In recent years, more and more astronomers are choosing to
store and make available their data on institutional repositories,
personal websites and data digital libraries.
Just to show how citations work, here is a cite to Batista's work \cite{batista} and Leo Egghe's \cite{gindex}. While here is a citation which is not even in the bibliography file but it is on ADS so it can be cited by URL \cite{http://adsabs.harvard.edu/abs/2007prpl.conf..133G}.
we describe the use of personal data repositories as a means to enable
the publication of data by individual astronomy researchers. by repository we mean
in astronomy this accumulation might include the collection of bits of raw images taken at the telescope or subsets of processed data from a space observatory archive.
from this collection or pile of data, the data stack is distilled into new research objects. for example, raw spectra are calibrated and combined into a higher s/n data product.
these distilled products are further refined even chopped up into smaller bits where the relevant scientific information packet is much more highly concentrated; we consider such a packet of knowledge “publishable”
consider this flow of information then consider just how linear it appears to be.
the typical end of this evolution of accumulation and distillation the research data is the publication.
there are a few problems with data objects appearing in papers: if at all they capture the most refined research objects. they fork only with the paper. the avoid curation by domain specific experts -- the journals have neither a peer-review process nor an editorial process for “data”.
worse, they are not trackable in the papers. Even if they do have identifiers and even if those identifiers , these data products require a different framework for reuse.
By data materials, we mean any data product available
on the web which was either instrumental for the pursuit of research,
e.g. raw data from astronomical archives, or generated in the context
of research, e.g., reduced and processed data presented in a paper.
\section{Results}
\subsection{Exploratory analysis of data citation practices}
To begin, we mine a corpus of astronomy articles for external web
links. By ``external web link'' we mean: any outgoing link embedded in the
final published version of an article (e.g., its PDF or HTML format)
which points to an online resource in the \url{http} (or \url{https}) URI
scheme. The purpose of this exploratory analysis is to assess whether
astronomers use links within articles to point to datasets and related
supplemental data resources.
We analyze a corpus of all articles published in the four main
astronomy journals (The Astrophysical Journal, The Astrophysical Journal
Letters, The Astrophysical Journal Supplement, The Astronomical Journal) between
1997 and
2008. We find a total of $13447$ potential links to datasets in a
total of $7641$ publications. The detailed procedure by which
potential data links are selected and filtered is
described in the Materials and Methods section.
In the barplot of Figure \ref{fig:barplot} we show how linking
practices have changed over
time. Links to potential data resources in astronomy first appear in
1997, with only a couple of dozens links published in that year, and
quickly increases every year to reach around $1500$ yearly links in
2005. After 2005, the volume of total published links roughly stays
the same every year. The graph shows that with widespread use and
adoption of the WWW, linking to online resources within published
articles has become more and more popular. The bars in the barplot of Figure
\ref{fig:barplot} also depict whether published links are still
available as of December 2011: the green portion of each bar represents
the volume of valid links (HTTP status code 200: OK), while the grey
portion of the bars represents broken links (HTTP status codes 3xx,
4xx, and 5xx). This link categorization shows that half or more of all
links published prior to 2001 are now broken. The percentage of broken
links decreases with time to reach roughly 10\% in 2008: one in ten links
included in astronomy papers in 2008 is unreachable three
years later.
This analysis can be pushed further by exploring two distinct subsets
of the astronomy link corpus. In Figure \ref{fig:lines} we show how
the percentages of broken links differ over time for a set of $1801$ links to personal
websites (links which contain the tilde symbol \~ , which
are usually reserved for personal web pages on institutional servers)
and a set of $3731$ links to institutional, curated archives (a manually
selected list of domains that are obvious astronomy archives, such as
\url{archive.stsci.edu}). Attempting to make a distinction between
these two categories of links is of crucial importance. The former set
of links, the ``tilde links'', are potential pointers to datasets
found on personal websites. These may consist of data tables and
images which are the product of data analysis and reduction procedures
described in the accompanying paper.
As such, they do not belong to larger curated archives, which
normally host raw data only. Ideally, these datasets would be included
in the full text of the article, but oftentimes they are too large to
fit within the format of a published paper and are included on a
personal server and linked from within the paper. The latter set of
links, the ``curated archives'' links is, instead, a collection of
pointers to established archives and repositories, managed and curated
by institutions, surveys, telescope sites. Authors may want to link to
these resources to cite and acknowledge the raw data sources that they employed in
their research. Figure \ref{fig:lines} shows that the availability of these
two categories of links follow very different, yet expected,
patterns. The vast majority of ``tilde links'' published between 1997
and 2003 is not available any more (personal links are depicted as
a black solid line and circles). Astronomers change locations, jobs,
institutions and, as such, their personal web servers change or expire
over time. However, the percentage of broken links to personal
websites falls rapidly: nearly all ``tilde links'' published in 2008
are still accessible today. A different scenario emerges when one
looks at the temporal pattern for links to curated archives
(depicted in the graph as a red line and crosses): the percentage of
broken links stays roughly the same over time (between 15\% and
20\%), indicating that curated, institutional websites are much less
vulnerable to temporal effects than personal websites.
This exploratory analysis reveals three key findings. First, since the
inception of the web in the early 1990's, astronomers have
increasingly used links in articles to cite datasets and other
resources which do not fit in the traditional referencing schemes for
bibliographic materials. Second, as for nearly every resource on the web,
availability of linked material decays with time: old links to
astronomical materials are more likely to be broken than more recent
ones. Third, links to ``personal datasets'', i.e., links to potential
data hosted on astronomers' personal websites, become unreachable much
faster than links to curated ``institutional datasets''.
These findings point to a preliminary realization: that astronomers do
have a need to reference and include data materials in their
published work. Since they lack a standardized mechanism to reference these resources ---
data citations do not normally fit in the
format, structure, and scope of published journal articles --- they
attempt to cite datasets using simple linking from within
articles. Results from this preliminary analysis prompted a
qualitative interview study, described below.
\subsection{Interview results}
We conducted interviews with a dozen astronomers of the
Harvard-Smithsonian Center for Astrophysics working in disparate
fields of astronomy and at different stages of
their career: postdoctoral researchers, staff scientists, tenure-track
and tenured faculty. All interviews were conducted in person between
March and July, 2011.
The purpose of the interviews was to gather a first-hand account of
the needs and challenges of data referencing and archiving in
astronomy. Our interview rubric was freely based on the Data Curation
Profiles Toolkit developed by the Distributed Data Curation Center at
the Purdue University Libraries and the Graduate School of Library and
Information Science at the University of Illinois at Urbana-Champaign
(\url{http://datacurationprofiles.org/}). Before every interview we
created a record of the interviewee which contained key information
such as name, academic role, affiliation, department, area of
specialization, website, as well as an annotated list of recent and/or
prominent astronomy projects pursued and published datasets, and
pointers to one or two recent published articles, possibly containing
links to daatsets. The template for our
semi-structured interview consists of questions revolving around
these topics:
\begin{description}
\item[A story] We begin with a very open-ended question, asking
astronomers to tell us a story about their data. In the case of very
prolific authors, we ask them to focus their story around a specific
paper or project. We allow the researcher to discuss about their
research, their data practices, their data output, their scientific
work flow, and their community of practice. With this first
question, we gauge potential projects and paper and we steer the
conversation towards a specific one, which becomes the subject of
the following questions.
\item[Generated output] What were the important stages of data
production, analysis and interpretation? Did you collect new data?
Archival data? How dependent are your results on the software tools
used in each stage of the data analysis? Did you create new
software?
\item[Availability] Are any/all of these data currently available for
download/perusal? If yes, where? What platform are you using? What stages, versions or types
of the data are available? If not, why not? Would you be happy to
make those data available?
\item[Data citation] How
can your data be cited/referenced? Can you pinpoint some publications that
were clearly based on these data? Are these publications on ADS?
\item[Format and size] Are the data available as separate files? What
formats are they in? How large are they?
\item[Ownership] What sort of licensing do you envision for your data?
Do you have contractual obligations and/or restrictions to preserve or share your data?
\item[Desired features] If your data were to be made available on a
platform that allows their storage, discovery, and citation, would
you want to offer visualizations of your data? Would you want to
allow users to run simple statistical analyses on your data? Would
you allow users to download the entire datasets or portions of
thereof?
\end{description}
\subsubsection{Data stories}
During the interviews, we listened to a very diverse collection of data stories. In most cases, the stories were very much rooted not only
in the specific project that we were being told about, but in the data
practices of a given subdiscipline of astronomy. For example, an
interviewee working with quasars monitors and regularly publishes flux density
data which are used for calibration purposes. These data are
relatively limited in size and are hosted on an institutional
webserver:
\begin{quote}
There is a website which is essentially a flat ASCII file that has information for a particular day for a given number of quasars. I convert the raw data into a standard format with columns: source, date, time frequency, flux and error.
\end{quote}
Another
example is an interviewee working with galaxy clusters who told us that the amount of
data handled and processed in their research is so large that it involves the joint work
of many staff scientists and graduate students. Hosting and providing
access to the various levels of data involved in the production of the
final reduced data is beyond the capabilities of a single research
group. In their own words:
\begin{quote}
We could certainly put a data table in the publication with very
heavily digested quantities like velocity dispersion and number of
galaxies, but those things are derived from upstream raw data. You
would argue that it would be more value to the community if we were to
make the image archive available. I am probably not going to send all
the Magellan and HSST images to the ApJ though. But I could well
imagine twenty years in the future that that image archive has more
endured value than our attempt to extract information out of those
images.
\end{quote}
These two examples are telling of the differing scales at which data
practices operate: from small continually-updated datasets which are
currently hosted on personal webservers to large, collaboration-enabled
surveys whose data do not have an obvious home. Overall, we found
that the mechanisms by which data are used and handled differ widely
from project to project and between different subdomains and wavelengths.
\subsubsection{Generated output}
As for the previous question, the data products generated in the
context of different research endeavors, and their prodcution
mechanisms, varied greatly between different projects. An
interviewee, for example, indicated that the source of their research
is entirely archival data and that the bulk of their research is
writing the software and running analyses with it:
\begin{quote}
We just used and combined catalog data from many different large area
surveys containing photmetric description of different extragalactic sources (galaxies and
quasars): their magnitude, fluxes, and morphological parameters. Then
we subjected these large tables to some Machine Learning methods to estimate the redshift of the sources. The result was an augmented table which included additional information about estimates of photometric redshits.
\end{quote}
In some other cases, astronomers were interested exclusively in the
scientific findings of their research; the mechanisms by which the data were
reduced and analyzed might have not been documented properly:
\begin{quote}
We didn’t write software from scratch, but we used it in ways that
might not be so easily reproduced. That’s what you read in the data
section of a paper when it says something like: \textit{we smoothed the data to
such and such a resolution and then we did this and then we did
that}. Whether the person [running the analysis] gets the
order of the steps right may actually affect the final outcome.
I am not sure whether these software workflows got perfectly documented.
\end{quote}
Despite the many types of data products generated, a visible thread of
similarity between responses can be found in the prominence of social and human factors involved in
the production of these data products. Interviewees often reported
that the various levels of data generated are entirely in the hands of
the people involved in the projects. An interviewee summarized the
prevalence of this practice as:
\begin{quote}
If we were rich and organized we would be like Sloan and we would have:
Data release 1.0, Data release 2.0, etc. But we have more like: Graduate student 1, Graduate student 2, Graduate student 3 (laughs)
\end{quote}
\subsubsection{Availability}
All the astronomers interviewed in this study state that
they are willing to share with the public all the reduced data
generated in the context of the discussed projects. Only two-thirds of
them, however, have gone through the effort of storing the data and
making it available online.
The vast majority of those that currently make available their reduced
data online chooses to use a dedicated personal
webserver, generally accessible from the Principal Investigator's
personal website or group laboratory page. The flavors and levels of
data offered on these personal webservers differs greatly among
projects. however. Some astronomers limit themselves to posting the
minimum amount of data necessary to supplement a published article, or
to accommodate the requests of the referees to see the data. In some
other cases, astronomers post various levels of data, from raw to
reduced data. Yet, whether the amount and description of data suplied is sufficient
to entirely replicate a study is unclear and varies from case to
case. One astronomer admits that access to raw data is a barrier to
reproducibility of results:
\begin{quote}
Could we get the raw data from that survey? We did not archive the
totally raw unreduced data but there is a tape library somewhere with
all the data, but it would be difficult to find. And so I’d give you
maybe sixty percent odds that we could get that data now. Those raw
data were taken in 2001, 2003, 2004, and maybe some in 2005. I don’t
even remember.
\end{quote}
Another astronomer working with raw data from a larger survey (Sloan
Digital Sky Survey) indicated that the raw data used in their study are indeed available
somewhere (on the SDSS archives), but has doubts on whether linking
raw to reduced data has a real utility:
\begin{quote}
How many people re-reduce SDSS images? I make a guess: there are
probably ten people on the face of earth that ever re-reduced Sloan
images.
\end{quote}
Only a couple of interviewed astronomers employed other
techniques to make the data available, which do not involve posting
data to a private webserver. For example, the catalogs of
photometric redshifts discussed earlier on were made available via dedicated services in
the VO framework (Virtual Observatory). They can
be accessed through the VO registry and through a number of popular
astronomy applications.
\subsubsection{Data citation}
Interviewees are also unsure about the best way that other
researchers can cite their data. If they have published a ``data
paper'', i.e. a refereed article describing the data, the data collection, and
analysis in detail, they prefer to receive a citation to the paper. In
all the other cases, they are happy to just receive mention of the via
a URL link pointing to the data or an acknoweldgement in the publication.
\begin{quote}
Journals don’t seem to be concerned with standardizing that [how data are cited]. If you use the data from someone else’s project then we just say we downloaded it from the archive. Sometimes people cite the program number and other times people go through the trouble of seeing if a paper has been published on it.
\end{quote}
\subsubsection{Format and size}
All astronomers unanimously indicated FITS (Flexible Image Transport
System) to be the data format of choice for all their data needs. As
one astronomer aptly summarized:
\begin{quote}
The FITS format does everything I need. It's hard to change. It is a
ubquitous self-defining data structure. You can download one from 20
years ago and it still works.
\end{quote}
As for size, the spectrum was much more diversified with some small
datasets, e.g., in the range of few Megabytes for quasar density flux
data, some medium-sized datasets, e.g., up to a dozen Gigabytes total
for the thermal emission data from the survey of star forming regions,
to some much larger archives in the order of many Terabytes, e.g., for
galaxy cluster image data.
\subsubsection{Ownership}
Astronomy is a discipline which studies a matter ---
celestial objects and astronomical phenomena --- that are
by definition public domain. This is probably why the inclination to
share data seems to be ingrained in the mindframe of virtually all
astronomers.
None of the interviewed researchers indicated that the data were
``theirs'' or that they were under contractual agreements of working
under restrictions that would impede them to share their reduced
data. All astronomers indicated that their data, no matter how reduced
and ingested from its original raw format, were public data. This
remark was stressed even more by two interviewed ``computational astronomers''
whose research is based on the aggregation and analysis of data in
existing astronomical catalogs:
\begin{quote}
We truly believe that sharing data is the right thing to do, simply
because the original data we used for this study was not ours. Our
study was only possible because other astronomers made their data
publicly available in the first place!
\end{quote}
\subsubsection{Desired features}
We asked astronomers whether they could think of any specific features
that an online hosting platform for their reduced data should have in
order to allow easy access, visualization, and analysis by users.
All respondents indicated that such a platform should, at the most basic
level, allow citation and download of the data. Another very basic
feature suggested by nearly every interviewee is the ability to select
and download only a subset of the data available for a specific
project, rather than the entire dataset. Thus, for example, a user should be able to
select a region of the sky delimited by coordinates (Right Ascension, Declination
and an angular radius) and download matching observations
for that region. For time-varying phenomena, the ability to subset by
temporal parameters was indicated. Only a small portion of the people interviewed indicated the need for
a more sophisticated filtering and subsetting mechanism, supported by a
strong query language and/or full interoperability with existing
frameworks, such as the VO registry.
Interestingly, none of the interviewed astronomers suggested that the
data hosting platform features advanced analysis and
visualization techniques.
\section{Discussion}
We find that astronomers are increasingly willing to reference and share the secondary or processed data sets used to derive the results in their publications. However, a common infrastructure to share this type of data sets and guidelines for good practices on how to cite them are still lacking. This results in invalid data references over time and incomplete publications which can not be validated or built upon them.
This group is involved in a project that has provided a solution to these problems in social science (refs), and is now in the process of being adapted to astronomy (theastrodata.org, seamless astronomy refs). The project, which uses the Dataverse Network software as the underlying infrastructure (refs), intends to achieve two main goals, both critical in data sharing;
1) a central repository where (small) astronomy data sets can be deposited and archived for long term access, and
2) a data citation that includes a persistent identifier which links to the data, and should be added to the the references sections of any publication.
The central repository not only serves as a mere file system to drop and access data files, but instead provides the tools to understand the nature of the data sets and how they can be reused. It accomplishes this by allowing to add descriptive metadata about the data set and complementary files such as documentation and code, and extracting metdata automatically from the data file. It also provides the infrastructure to replicate the data files to multiple locations and export the metadata to make the data sets more easily discoverable by other systems.
A formal data citation is the other key piece of data sharing. It provides a persistent link between the publication and the data set, so that if the location of the data set changes in the future, the persistent link can still be resolved to the same data set (ref. to Handles). It also provides attribution to the various contributors - authors and data producers or providers - properly given credit to the authors that collected and process the data. Finally, a formal, standardized data citation is needed to facilitate the adoption of data citation by publishers - it is critical that this type of citations become part of the references sections in publications, and are easily traceable to derive their impact.
\section{Materials and Methods}
We analyze a corpus of all articles published between 1997 and 2008 in the four main
astronomy journals (The Astrophysical Journal, The Astrophysical Journal
Letters, The Astrophysical Journal Supplement, The Astronomical Journal) which
contain external URL links in their full text. We initially find $33847$ external links
in $13390$ articles. \url{http://hdl.handle.net/10904/10214} \cite{astrocite}
In order to isolate potential links to datasets from this list, we
perform the following filtering workflow. First, we remove links to
domains that are scholarly repositories and which obviously do not
host data (or which did not host data prior to 2008). These include
domains such as \url{dx.doi.org}, \url{arxiv.org}, \url{xxx.lanl.gov},
and \url{adsabs.harvard.edu}. Removing links to these domains, which
are obviously pointers to articles, narrows down the corpus to
$26663$.
Second, we remove all links which are found in the reference list of
an article. While it is entirely possible that authors cite datasets in the
same way as they cite bibliographic references, an exploratory analysis revealed that links
in the reference section of a paper were, by and large, pointers to articles, preprints,
star catalogs, circulars, manuals, and user guides. Therefore, we
remove these ``reference links'', bringing the corpus down to $20767$
links.
Third, based on the observation that links to datasets are generally
not found at the root of a website hierarchy, we removed links that
contain less than 2 forward slashes (other than the two slashes found in
the leading ``http://''). For example, the link to
\url{http://www.sdss.org} was dropped from the corpus (0 slashes),
while the link to
\url{http://www.cfa.harvard.edu/COMPLETE/data_html_pages/data.html}
was retained (3 slashes). This final filtering procedure reduces the
corpus to $13447$ links, which we consider potential links to datasets. \cite{astrocite} Some descriptive statistics about this corpus
of links is presented in Table \ref{tab1}.
\subsection{Acknowledgments}
We thank Michael Blake and Tomoko Kurahashi who helped with
interviews, transcription, and coding, and with data curation,
respectively. We also thank Alberto Accomazzi, Jay Luker, and the Astrophysics Data System team
at the Harvard-Smithsonian Center for Astrophysics for providing
access to the bibliographic data used for the exploratory data
citation analysis.
\subsection{Figures}
\begin{figure}[tb]
\includegraphics[width=\columnwidth]{figures/figure1/figure1.jpg}
\caption{\textbf{Figure 1. Volume of potential data links in astronomy publications.} Total volume
of external links in all articles published between 1997 and 2008 in
the four main astronomy journals, color coded by HTTP status
code. Green bars represent accessible links (200), grey bars represent
broken links.}
\end{figure}
\begin{figure}[tb]
\includegraphics[width=\columnwidth]{figures/figure2/figure2.jpg}
\caption{\textbf{Figure 2. Percentage of broken links in astronomy publications according to
type of website.} Percentages
of broken external links in all articles published between 1997 and 2008 in
the four main astronomy journals. Black circles represent links to
personal websites (link values contain the tilde symbol, \textasciitilde), while
red crosses represent links to curated archives such as governmental
and institutional repositories.}
\end{figure}
\subsection{Tables}
\begin{table}
\caption{\textbf{Table 1. Some descriptive statistics about top domains linked in astronomy publications}. This table lists total number of links and broken links (HTTP status codes 3xx, 4xx, and 5xx) to top domains (domains with
over 100 links) found within articles published in
the four main astronomy journals between 1997 and 2008. The table also
shows, for each domain, the portion of links to common filename
extensions, as well as links that contain the tilde character.}
\begin{tabular}{l|cccccccc}
\hline
{\bf Domain}&\textbf{links (broken)}&\textbf{.html}&\textbf{.txt}&\textbf{.dat}&\textbf{.gz}&\textbf{.tar}&\textbf{.fits}&\textbf{tilde}\\
\hline\hline
cxc.harvard.edu&802 (110)&336 (70)&0&0&4 (2)&5 (4)&1&0\\
heasarc.gsfc.nasa.gov&640 (33)&423 (27)&1&0&0&0&0&0\\
www.stsci.edu&498 (61)&205 (29)&3&0&0&0&0&15 (10)\\
asc.harvard.edu&471 (152)&212 (99)&0&0&0&0&0&1 (1)\\
ssc.spitzer.caltech.edu&427 (194)&125 (76)&3 (3)&0&0&0&0&0\\
cfa-www.harvard.edu&352 (68)&277 (52)&1&0&0&0&0&54 (17)\\
archive.stsci.edu&308 (58)&57 (9)&2&1 (0)&0&0&0&0\\
www.ipac.caltech.edu&285 (14)&209 (12)&0&0&0&0&0&0\\
www.atnf.csiro.au&211 (21)&12 (6)&0&0&0&0&0&7 (5)\\
space.mit.edu&193 (10)&58 (5)&1&0&0&0&0&2 (1)\\
www.astro.psu.edu&186 (4)&103 (1)&1&10&1&1&0&2\\
www.eso.org&186 (58)&54 (22)&1 (1)&0&0&0&0&4 (1)\\
irsa.ipac.caltech.edu&163 (5)&38&0&0&1&0&0&0\\
www.sdss.org&156 (2)&106 (1)&0&0&0&0&0&0\\
hea-www.harvard.edu&125 (37)&42 (17)&1&0&0&1&0&26 (16)\\
physics.nist.gov&125 (3)&63 (2)&0&0&0&0&0&0\\
www.noao.edu&120 (3)&50 (2)&0&0&0&0&0&0\\
xmm.vilspa.esa.es&118 (35)&23 (19)&0&0&8 (1)&0&0&1 (1)\\
www.astro.princeton.edu&115 (31)&43 (14)&0&0&0&0&0&53 (12)\\
ad.usno.navy.mil&110 (27)&98 (22)&3 (3)&0&0&0&0&1 (1)\\
\end{tabular}
\label{tab1}
\end{table}
\bibliography{bibliography/biblio}
\end{document}