August Muench edited discussion.tex  over 10 years ago

Commit id: f03df3946b4962ad13591e6cac0ce9a34efeae5d

deletions | additions      

       

\section{Discussion}  With this study we found that, overall, astronomers are increasingly willing to reference and share the secondary or processed data sets used to derive the results in their publications. However, these same astronomers have failed to embrace a common infrastructure to share these types of data sets. Whether It is also unclear if  this finding  is because such infrastructure is lacking or because it is unknown to (or untenable as a solution for) most astronomers. Interestingly, astronomy, as a field, has \textbf{pioneered} the creation of international initiatives for the collection, organization, and sharing of data. The embracing of a Large archives that serve primary data sets have embraced the  "virtual observatory" (VO) concept  for astronomy is over more than  a decade old, and had been adopted by many large archives of primary data. decade.  Yet it's astronomy's  failure to provide a data sharing solution for smaller derived data sets is worth a deeper discussion in light of our survey results. \subsection{The virtual observatory}   Focusing on efforts in the United States to facilitate a 'virtual observatory', virtual observatory,  we note that the 2000 decadal review of by  the National Research Council called for the creation of a "National Virtual Observatory" as its highest small initiatives priority. It was enacted with a grant from the National Science Foundation in 2001, entitled "Building the Framework for the National Virtual Observatory." (See \url {http://virtualobservatory.org/whatis/history.aspx} for a history of the US Virtual Observatory efforts.) The grant essentially implemented a vision for sharing astronomy data online put forward in a \textit{Science} article about "The WorldWide Telescope" by Szalay and Gray in 2001 \cite {2001Sci...293.2037S}. The scope of this research was broad, including standards development and professional outreach to scientists (See \cite{vobook}).  In 2010, NASA and NSF reached a cooperative agreement to fund and maintain a US Virtual Astronomical Observatory, implementing the research done under the 2001 Framework grant as a formal structure for tool and standards development, as well as a venue for professional and public outreach about the VO. Unfortunately, NSF announced plans (now being implemented) to de-fund its (80\%) share of the US VAO, leading to a cessation of the US VAO in September 2014. Opinions on why and how this happened are beyond the scope of this paper. What is important for our purposes is to point out that even 1) the scope for both the NVO and VAO efforts skewed toward serving large, homogenous datasets; 2)  the most robust, important and adopted infrastructure-related efforts of the VAO, like the VO "Registry" essential for tools to find data, are not at all secure from funding cuts --- and this can put cuts. These two facts we feel have sought to undermine the ability of the VO to serve the data sharing needs of astronomers while also putting  doubt in the minds of astronomers thinking about doing extra work to share their data. , creating working online observatory. The concept extent to which these US based VO efforts were successful is hard to measure. Certainly, large archives  of a virtual observatory primary data have embraced standards based data access and sharing. It  is well ingrained in these rich interfaces that enable  the mind creation  of all astronomers as well the kinds of data aggregation tools envisioned by Szalay & Gray. Some tools, such  as the recently released US VAO Data Discovery tool \url{http://vao.stsci.edu/portal/Mashup/Clients/Portal/DataDiscovery.html} could not exist without VO tools like the "Registry" and data access protocols  that have been adopted by the archives. In 2008, Microsoft Research released a free software package named "WorldWide Telescope" (WWT), in honor of Szalay and Gray's 2001 vision. Today, WWT, which uses a large amount  of most scientists infrastructure established under the NVO  and science experts; VAO grants, and connects to many services developed outside the US (under  the majority "International Virtual Observatory Alliance" standards) is probably the best US-origin implementation  of the virtual observatory vision of connected datasets. The combination of tools offered by the Centre de Donnees astronomiques de Strasbourg (CDS; \url {http://cds.u-strasbg.fr}) also offer excellent access to VO services. Many  data Yet, this study finds that there has been very little sets from NASA and other large survey providers are available within WWT and CDS tools, and astronomers can offer their own data in these frameworks as well, but  uptake of is still slower than one might imagine. Again, as one cascades to smaller datasets,  the tools that have been created under adoption of  the umbrella VO/WWT frameworks for data sharing declines. One example of a medium-size survey (COMPLETE; see \url {http://www.cfa.harvard.edu/COMPLETE/data_html_pages/data.html}) being served at a research group's web site using an HTML5 WWT client is at \url {http://www.worldwidetelescope.org/complete/wwtcoveragetool5.htm}. A summary  of the Virtual Observatory. usage and functionality of WWT in research and education is offered in \citet {2012ASPC..461..267G}.  It is safe to state that Yet  the visionthat the VO will provide a "virtual sky based on the enormous data sets being created now and the even larger ones proposed for the future" that could "enable a new mode of research for professional astronomers and will provide to the public an unparalleled opportunity for education and discovery" \cite{vobook} was not met.         In 2008, Microsoft Research released a free software package named "WorldWide Telescope" (WWT), in honor  of Szalay and Gray's 2001 vision. Today, WWT, which uses a large amount of infrastructure established under the NVO and VAO grants, and connects to many services developed outside the US (under the "International Virtual Observatory Alliance" standards) is probably the best US-origin implementation of the virtual observatory vision. The combination of tools offered by the Centre de Donnees astronomiques de Strasbourg (CDS; \url {http://cds.u-strasbg.fr}) also offer excellent access to VO services. Many data sets from NASA and other large survey providers are available within WWT and CDS tools, and astronomers can offer their own data in these frameworks as well, but uptake is still slower than one might imagine. An example of a medium-size survey (COMPLETE; see \url {http://www.cfa.harvard.edu/COMPLETE/data_html_pages/data.html}) being served at a research group's web site using an HTML5 WWT client is at \url {http://www.worldwidetelescope.org/complete/wwtcoveragetool5.htm}. A summary of the usage and functionality of WWT in research and education is offered in \citet {2012ASPC..461..267G}.     The vistion of Szalay and Gray: "All & Gray that "all  astronomy data and literature will soon be online and accessible via the Internet", while   Thus, despite Internet" has not been achieved primarly because of a lack of focus on the smaller dervied data sets created by astronomers, which we show are shared by email, ftp, or personal websites to these data's determiment. Despite  the existence of global infrastructure initiatives, led by a mix of government and corporate funds, and despite the publication of numerous guidelines and principles on the topic \cite{citationprinciples,tenrules}, \cite{citationprinciples, tenrules},  the practices of data sharing, data archiving, and data citation in the astronomical community are far from being widely known. \subsection{The Dataverse Network} 

A Dataverse Network consists of dataverses, and each dataverse can be branded or customized for an individual researcher, or group, or project, or journal. A dataverse owner has control over the branding, the metadata, and the sharing and release of the data, thus he can completely manage his own virtual data archive, while all data are stored in a centralized, public research data repository that guarantees proper archival and long-term access. The Dataverse Network follows good practices for scientific data publication: 1) supports metadata standards and enables the inclusion of accompanying code and other materials for each dataset, 2) provides versioning of a dataset, with easy access to previous versions of the data and metadata, 3) assigns a persistent identifier (DOI) and generates a full data citation, with attribution to data authors and distributors (\cite{AltmanKing2007}). The generated data citation follows the recently proposed principles for data citation, and international initiative which recognizes that 'data should be considered legitimate, citable products of research' \cite{citationprinciples}. Once a dataset is released for publication, it cannot be unreleased, to guarantee that the data citation, and its persistent url, can always be resolved to a data page that includes sufficient information about the dataset and access to the data files. In some uncommon cases, a dataset might be deaccessioned due to a retraction or legal issue, but even in these cases, the persistent identifier in the data citation will still resolve to a page with information about the missing dataset.  \subsection{The AstroData} Astronomy Dataverse Network}  After an analysis of existing Dataverse Network repositories --- most of which host social science data --- we discovered that the Dataverse Network software could be slightly adapted and repurposed to host astronomical data. This adaptation consisted of two main enhancements to to the Dataverse software: 1) a flexible, extensible metadata schema that could support fields typically needed to describe a dataset in Astronomy, and 2) deep search for FITS files, that is, indexing FITS files header information to facilitate discovery of such files. Both enhancements are in continue development, as the Dataverse team receives feedback from the astronomy community through usability testing and iterations of the software. The metadata will be further enhanced in version 4 of the project, following standards from generic VAO metadata fields.