deletions | additions
diff --git a/discussion.tex b/discussion.tex
index 694c077..0a67040 100644
--- a/discussion.tex
+++ b/discussion.tex
...
\section{Discussion}
With this study we found that, overall, astronomers are increasingly willing to reference and share the secondary or processed data sets used to derive the results in their publications. However, these same astronomers have failed to embrace a common infrastructure to share these types of data sets.
Whether It is also unclear if this
finding is because such infrastructure is lacking or because it is unknown to (or untenable as a solution for) most astronomers. Interestingly, astronomy, as a field, has \textbf{pioneered} the creation of international initiatives for the collection, organization, and sharing of data.
The embracing of a Large archives that serve primary data sets have embraced the "virtual observatory"
(VO) concept for
astronomy is over more than a
decade old, and had been adopted by many large archives of primary data. decade. Yet
it's astronomy's failure to provide a data sharing solution for smaller derived data sets is worth a deeper discussion in light of our survey results.
\subsection{The virtual observatory}
Focusing on efforts in the United States to facilitate a
'virtual observatory', virtual observatory, we note that the 2000 decadal review
of by the National Research Council called for the creation of a "National Virtual Observatory" as its highest small initiatives priority. It was enacted with a grant from the National Science Foundation in 2001, entitled "Building the Framework for the National Virtual Observatory." (See \url {http://virtualobservatory.org/whatis/history.aspx} for a history of the US Virtual Observatory efforts.) The grant essentially implemented a vision for sharing astronomy data online put forward in a \textit{Science} article about "The WorldWide Telescope" by Szalay and Gray in 2001 \cite {2001Sci...293.2037S}.
The scope of this research was broad, including standards development and professional outreach to scientists (See \cite{vobook}). In 2010, NASA and NSF reached a cooperative agreement to fund and maintain a US Virtual Astronomical Observatory, implementing the research done under the 2001 Framework grant as a formal structure for tool and standards development, as well as a venue for professional and public outreach about the VO. Unfortunately, NSF announced plans (now being implemented) to de-fund its (80\%) share of the US VAO, leading to a cessation of the US VAO in September 2014. Opinions on why and how this happened are beyond the scope of this paper. What is important for our purposes is to point out that
even 1) the scope for both the NVO and VAO efforts skewed toward serving large, homogenous datasets; 2) the most robust, important and adopted infrastructure-related efforts of the VAO, like the VO "Registry" essential for tools to find data, are not at all secure from funding
cuts --- and this can put cuts. These two facts we feel have sought to undermine the ability of the VO to serve the data sharing needs of astronomers while also putting doubt in the minds of astronomers thinking about doing extra work to share their data.
, creating working online observatory. The
concept extent to which these US based VO efforts were successful is hard to measure. Certainly, large archives of
a virtual observatory primary data have embraced standards based data access and sharing. It is
well ingrained in these rich interfaces that enable the
mind creation of
all astronomers as well the kinds of data aggregation tools envisioned by Szalay & Gray. Some tools, such as
the recently released US VAO Data Discovery tool \url{http://vao.stsci.edu/portal/Mashup/Clients/Portal/DataDiscovery.html} could not exist without VO tools like the "Registry" and data access protocols that
have been adopted by the archives. In 2008, Microsoft Research released a free software package named "WorldWide Telescope" (WWT), in honor of Szalay and Gray's 2001 vision. Today, WWT, which uses a large amount of
most scientists infrastructure established under the NVO and
science experts; VAO grants, and connects to many services developed outside the US (under the
majority "International Virtual Observatory Alliance" standards) is probably the best US-origin implementation of the
virtual observatory vision of connected datasets. The combination of tools offered by the Centre de Donnees astronomiques de Strasbourg (CDS; \url {http://cds.u-strasbg.fr}) also offer excellent access to VO services. Many data
Yet, this study finds that there has been very little sets from NASA and other large survey providers are available within WWT and CDS tools, and astronomers can offer their own data in these frameworks as well, but uptake
of is still slower than one might imagine. Again, as one cascades to smaller datasets, the
tools that have been created under adoption of the
umbrella VO/WWT frameworks for data sharing declines. One example of a medium-size survey (COMPLETE; see \url {http://www.cfa.harvard.edu/COMPLETE/data_html_pages/data.html}) being served at a research group's web site using an HTML5 WWT client is at \url {http://www.worldwidetelescope.org/complete/wwtcoveragetool5.htm}. A summary of the
Virtual Observatory. usage and functionality of WWT in research and education is offered in \citet {2012ASPC..461..267G}.
It is safe to state that Yet the vision
that the VO will provide a "virtual sky based on the enormous data sets being created now and the even larger ones proposed for the future" that could "enable a new mode of research for professional astronomers and will provide to the public an unparalleled opportunity for education and discovery" \cite{vobook} was not met.
In 2008, Microsoft Research released a free software package named "WorldWide Telescope" (WWT), in honor of Szalay
and Gray's 2001 vision. Today, WWT, which uses a large amount of infrastructure established under the NVO and VAO grants, and connects to many services developed outside the US (under the "International Virtual Observatory Alliance" standards) is probably the best US-origin implementation of the virtual observatory vision. The combination of tools offered by the Centre de Donnees astronomiques de Strasbourg (CDS; \url {http://cds.u-strasbg.fr}) also offer excellent access to VO services. Many data sets from NASA and other large survey providers are available within WWT and CDS tools, and astronomers can offer their own data in these frameworks as well, but uptake is still slower than one might imagine. An example of a medium-size survey (COMPLETE; see \url {http://www.cfa.harvard.edu/COMPLETE/data_html_pages/data.html}) being served at a research group's web site using an HTML5 WWT client is at \url {http://www.worldwidetelescope.org/complete/wwtcoveragetool5.htm}. A summary of the usage and functionality of WWT in research and education is offered in \citet {2012ASPC..461..267G}.
The vistion of Szalay and Gray: "All & Gray that "all astronomy data and literature will soon be online and accessible via the
Internet", while
Thus, despite Internet" has not been achieved primarly because of a lack of focus on the smaller dervied data sets created by astronomers, which we show are shared by email, ftp, or personal websites to these data's determiment. Despite the existence of global infrastructure initiatives, led by a mix of government and corporate funds, and despite the publication of numerous guidelines and principles on the topic
\cite{citationprinciples,tenrules}, \cite{citationprinciples, tenrules}, the practices of data sharing, data archiving, and data citation in the astronomical community are far from being widely known.
\subsection{The Dataverse Network}
...
A Dataverse Network consists of dataverses, and each dataverse can be branded or customized for an individual researcher, or group, or project, or journal. A dataverse owner has control over the branding, the metadata, and the sharing and release of the data, thus he can completely manage his own virtual data archive, while all data are stored in a centralized, public research data repository that guarantees proper archival and long-term access. The Dataverse Network follows good practices for scientific data publication: 1) supports metadata standards and enables the inclusion of accompanying code and other materials for each dataset, 2) provides versioning of a dataset, with easy access to previous versions of the data and metadata, 3) assigns a persistent identifier (DOI) and generates a full data citation, with attribution to data authors and distributors (\cite{AltmanKing2007}). The generated data citation follows the recently proposed principles for data citation, and international initiative which recognizes that 'data should be considered legitimate, citable products of research' \cite{citationprinciples}. Once a dataset is released for publication, it cannot be unreleased, to guarantee that the data citation, and its persistent url, can always be resolved to a data page that includes sufficient information about the dataset and access to the data files. In some uncommon cases, a dataset might be deaccessioned due to a retraction or legal issue, but even in these cases, the persistent identifier in the data citation will still resolve to a page with information about the missing dataset.
\subsection{The
AstroData} Astronomy Dataverse Network}
After an analysis of existing Dataverse Network repositories --- most of which host social science data --- we discovered that the Dataverse Network software could be slightly adapted and repurposed to host astronomical data. This adaptation consisted of two main enhancements to to the Dataverse software: 1) a flexible, extensible metadata schema that could support fields typically needed to describe a dataset in Astronomy, and 2) deep search for FITS files, that is, indexing FITS files header information to facilitate discovery of such files. Both enhancements are in continue development, as the Dataverse team receives feedback from the astronomy community through usability testing and iterations of the software. The metadata will be further enhanced in version 4 of the project, following standards from generic VAO metadata fields.