Alberto Pepe

and 1 more

If you are a scholar and haven't spent the last ten years in a vacuum, you have heard of Open Access: the emerging practice of providing unrestricted access to peer-reviewed scholarly works, such as journal articles, conference papers, and book chapters. Open Access comes in two flavors, _green_ and _gold_, depending on how an article is made available to the public. The "green road" to Open Access involves an author making her article publicly available after publication, e.g. by depositing the article's post-print in an open institutional repository. According to many, the preferred avenue to achieve Open Access, however, is the "golden road" which happens when an author publishes an article directly in an OA journal. THE FACT THAT OPEN ACCESS, REGARDLESS OF ITS FLAVOR, HAS INNUMERABLE BENEFITS FOR RESEARCHERS AND THE PUBLIC AT LARGE IS BEYOND DISCUSSION --- even the most traditional scholarly publishers would have to agree. Importantly, the vision of universal Open Access to scholarly knowledge, i.e., the idea that the entire body of published scholarship should be made available to everyone free of charge, is not too far fetched. In practice, by a combination of green and golden OA practices, this vision is already a reality in some scientific fields, such as physics and astronomy. So: Open Access is both fundamentally necessary and bound to happen. BUT, WHETHER OPEN ACCESS, ALONE, CAN GUARANTEE REPRODUCIBILITY AND TRANSPARENCY OF RESEARCH RESULTS IS A DIFFERENT AND COMPELLING QUESTION. Do research articles contain enough information to exactly (or even approximately) replicate a scientific study? Unfortunately, very often the answer to this question is no. As science, and scholarship in general, become inevitably more computational in nature, the experiments, calculations, and analyses performed by researchers are too many and too complex to be described in detail in a research article. As such, the minutiae of research activity are often hidden from view, making science unintelligible and irreproducible, not only for the public at large, but also for scientists, experts and, paradoxically, even for the same scientists who conducted the research in the first place, who may have not documented their exact workflows elsewhere. A parallel movement to Open Access --- Open Science --- is building up momentum in scholarly circles. Its mission is to provide open, universal access to the full sources of scientific research.
Link volume

Alberto Pepe

and 4 more

We analyze data sharing practices of astronomers over the past fifteen years. An analysis of URL links embedded in papers published by the American Astronomical Society reveals that the total number of links included in the literature rose dramatically from 1997 until 2005, when it leveled off at around 1500 per year. This rise indicates an increased interest in data-sharing over the same time period that the web saw its most dramatic growth in usage in the developed world. The analysis also shows that the availability of linked material decays with time: in 2011, 44% of links published a decade earlier, in 2001, were broken. A rough analysis of link types reveals that links to data hosted on astronomers’ personal websites become unreachable much faster than links to datasets on curated institutional sites. To gauge astronomers’ current data sharing practices and preferences further, we performed in-depth interviews with 12 scientists and online surveys with 173 scientists, all at a large astrophysical research institute in the United States: the Harvard-Smithsonian Center for Astrophysics, in Cambridge, MA. Both the in-depth interviews and the online survey indicate that, in principle, there is no philosophical objection to data-sharing among astronomers at this institution, and nearly all astronomers would share as much of their data as others wanted if it were practicable. Key reasons that more data are not presently shared more efficiently in astronomy include: the difficulty of sharing large data sets; over reliance on non-robust, non-reproducible mechanisms for sharing data (e.g. emailing it); unfamiliarity with options that make data-sharing easier (faster) and/or more robust; and, lastly, a sense that other researchers would not want the data to be shared. We conclude with a short discussion of a new effort to implement an easy-to-use, robust, system for data sharing in astronomy, at theastrodata.org, and we analyze the uptake of that system to-date.
Galileo

Alberto Pepe

and 1 more

INTRODUCTION In the early 1600s, Galileo Galilei turned a telescope toward Jupiter. In his log book each night, he drew to-scale schematic diagrams of Jupiter and some oddly-moving points of light near it. Galileo labeled each drawing with the date. Eventually he used his observations to conclude that the Earth orbits the Sun, just as the four Galilean moons orbit Jupiter. History shows Galileo to be much more than an astronomical hero, though. His clear and careful record keeping and publication style not only let Galileo understand the Solar System, it continues to let _anyone_ understand _how_ Galileo did it. Galileo’s notes directly integrated his DATA (drawings of Jupiter and its moons), key METADATA (timing of each observation, weather, telescope properties), and TEXT (descriptions of methods, analysis, and conclusions). Critically, when Galileo included the information from those notes in _Siderius Nuncius_ , this integration of text, data and metadata was preserved, as shown in Figure 1. Galileo's work advanced the "Scientific Revolution," and his approach to observation and analysis contributed significantly to the shaping of today's modern "Scientific Method" . Today most research projects are considered complete when a journal article based on the analysis has been written and published. Trouble is, unlike Galileo's report in _Siderius Nuncius_, the amount of real data and data description in modern publications is almost never sufficient to repeat or even statistically verify a study being presented. Worse, researchers wishing to build upon and extend work presented in the literature often have trouble recovering data associated with an article after it has been published. More often than scientists would like to admit, they cannot even recover the data associated with their own published works. Complicating the modern situation, the words "data" and "analysis" have a wider variety of definitions today than at the time of Galileo. Theoretical investigations can create large "data" sets through simulations (e.g. The Millennium Simulation Project). Large scale data collection often takes place as a community-wide effort (e.g. The Human Genome project), which leads to gigantic online "databases" (organized collections of data). Computers are so essential in simulations, and in the processing of experimental and observational data, that it is also often hard to draw a dividing line between "data" and "analysis" (or "code") when discussing the care and feeding of "data." Sometimes, a copy of the code used to create or process data is so essential to the use of those data that the code should almost be thought of as part of the "metadata" description of the data. Other times, the code used in a scientific study is more separable from the data, but even then, many preservation and sharing principles apply to code just as well as they do to data. So how do we go about caring for and feeding data? Extra work, no doubt, is associated with nurturing your data, but care up front will save time and increase insight later. Even though a growing number of researchers, especially in large collaborations, know that conducting research with sharing and reuse in mind is essential, it still requires a paradigm shift. Most people are still motivated by piling up publications and by getting to the next one as soon as possible. But, the more we scientists find ourselves wishing we had access to extant but now unfindable data , the more we will realize why bad data management is bad for science. How can we improve? THIS ARTICLE OFFERS A SHORT GUIDE TO THE STEPS SCIENTISTS CAN TAKE TO ENSURE THAT THEIR DATA AND ASSOCIATED ANALYSES CONTINUE TO BE OF VALUE AND TO BE RECOGNIZED. In just the past few years, hundreds of scholarly papers and reports have been written on questions of data sharing, data provenance, research reproducibility, licensing, attribution, privacy, and more--but our goal here is _not_ to review that literature. Instead, we present a short guide intended for researchers who want to know why it is important to "care for and feed" data, with some practical advice on how to do that. The set of Appendices at the close of this work offer links to the types of services referred to throughout the text. BOLDFACE LETTERING below highlights actions one can take to follow the suggested rules.
1nessie findingchart

Alyssa Goodman

and 10 more

ABSTRACT The very long, thin infrared dark cloud Nessie is even longer than had been previously claimed, and an analysis of its Galactic location suggests that it lies directly in the Milky Way’s mid-plane, tracing out a highly elongated bone-like feature within the prominent Scutum-Centaurus spiral arm. Re-analysis of mid-infrared imagery from the Spitzer Space Telescope shows that this IRDC is at least 2, and possibly as many as 8 times longer than had originally been claimed by Nessie’s discoverers, ; its aspect ratio is therefore at least 150:1, and possibly as large as 800:1. A careful accounting for both the Sun’s offset from the Galactic plane (∼25 pc) and the Galactic center’s offset from the (lII, bII)=(0, 0) position defined by the IAU in 1959 shows that the latitude of the true Galactic mid-plane at the 3.1 kpc distance to the Scutum-Centaurus Arm is not b = 0, but instead closer to b = −0.5, which is the latitude of Nessie to within a few pc. Apparently, Nessie lies _in_ the Galactic mid-plane. An analysis of the radial velocities of low-density (CO) and high-density (${\rm NH}_3$) gas associated with the Nessie dust feature suggests that Nessie runs along the Scutum-Centaurus Arm in position-position-velocity space, which means it likely forms a dense ‘spine’ of the arm in real space as well. No galaxy-scale simulation to date has the spatial resolution to predict a Nessie-like feature, but extant simulations do suggest that highly elongated over-dense filaments should be associated with a galaxy’s spiral arms. Nessie is situated in the closest major spiral arm to the Sun toward the inner Galaxy, and appears almost perpendicular to our line of sight, making it the easiest feature of its kind to detect from our location (a shadow of an Arm’s bone, illuminated by the Galaxy beyond). Although the Sun’s (∼25 pc) offset from the Galactic plane is not large in comparison with the half-thickness of the plane as traced by Population I objects such as GMCs and HII regions (∼200 pc; ), it may be significant compared with an extremely thin layer that might be traced out by Nessie-like “bones” of the Milky Way. Future high-resolution extinction and molecular line data may therefore allow us to exploit the Sun’s position above the plane to gain a (very foreshortened) view “from above" of dense gas in Milky Way’s disk and its structure.
Pof1

Alyssa Goodman

and 10 more

Screenshot
_This post accompanies a talk by the same name and author, presented at the 223rd Meeting of the American Astronomical Society, at 11:40 AM on January 6, 2014. Talk slides will be online after noon on January 6 at http://projects.iq.harvard.edu/seamlessastronomy/presentations._ ABSTRACT In 1610, when Galileo pointed his small telescope at Jupiter, he drew sketches to record what he saw. After just a few nights of observing, he understood his sketches to be showing moons orbiting Jupiter. It was the visualization of Galileo's observations that led to his understanding of a clearly Sun-centered solar system, and to the revolution this understanding then caused. Similar stories can be found throughout the history of Astronomy, but visualization has never been so essential as it is today, when we find ourselves blessed with a larger wealth and diversity of data, per astronomer, than ever in the past. In this talk, I will focus on how modern tools for interactive “linked-view” visualization can be used to gain insight. Linked views, which dynamically update all open graphical displays of a data set (e.g. multiple graphs, tables and/or images) in response to user selection, are particularly important in dealing with so-called “high-dimensional data.” These dimensions need not be spatial, even though, e.g. in the case of radio spectral-line cubes or optical IFU data), they often are. Instead, “dimensions” should be thought of as any measured attribute of an observation or a simulation (e.g. time, intensity, velocity, temperature, etc.). The best linked-view visualization tools allow users to explore relationships amongst all the dimensions of their data, and to weave statistical and algorithmic approaches into the visualization process in real time. Particular tools and services will be highlighted in this talk, including: Glue (glueviz.org), the ADS All Sky Survey (adsass.org), WorldWide Telescope (worldwidetelescope.org), yt (yt-project.org), d3po (d3po.org), and a host of tools that can be interconnected via the SAMP message-passing architecture. The talk will conclude with a discussion of future challenges, including the need to educate astronomers about the value of visualization and its relationship to astrostatistics, and the need for new technologies to enable humans to interact more effectively with large, high-dimensional data sets.
1nessie findingchart

Alyssa Goodman

and 6 more

Instructions for Co-Authors The full file repository for this paper is at a shared Google Drive directory, https://drive.google.com/#folders/0BxIRxiTe1u6BcGlnUGt2ckU1Vms, shared with all co-authors. NOTE: THE “AAS” (PRESS CONFERENCE) SLIDES AT HTTPS://DRIVE.GOOGLE.COM/#FOLDERS/0BXIRXITE1U6BRKLQRZLUAUNUUUU GIVE A BETTER IDEA OF WHERE THIS DRAFT IS GOING THAN THE TEXT/FIGURES HERE AS OF NOW... AG WILL UPDATE ALL BY C.1/1/13! The Mendeley Library “Nessie and Friends” used to house references used in this work, at: http://www.mendeley.com/groups/2505711/nessie-and-friends/, but since Authorea works more directly with ADS links, we’ll use the ADS Private Library at http://adsabs.harvard.edu/cgi-bin/nph-abs_connect?library&libname=Nessie+and+Friends&libid=488e32b08b instead. The Mendeley library is the source of the nessie.bib file in the “Bibliography” folder here on Authorea, but I am not sure how to get the ADS references out as a .bib file. xxAlberto?xx The Glue software used to intercompare data sets used in this work is online through: http://glue-viz.readthedocs.org/en/latest/ We are using Authorea.com as an experimental platform to compile this paper. The manual steps we will need to take before submission include: - download LaTeX file - modify LaTeX file to use aas macros - insert needed information (e.g. about authors, running header) into was version of LaTeX manuscript - extract needed figures from relevant folders here & bundle them with LaTeX manuscript & macros - create .bib file from ADS Private Library - add .bib file to folder with manuscript & figures - fix in-line referencing so that $\citet$ and $\citep$ commands work
Dear Professor Goodman, Thank you for agreeing to be our speaker at the Pappalardo Distinguished Lecture this fall. Below is a tentative schedule and lecture logistics for your talk. Tentative Schedule: Thursday, October 2, 2014 Noon – 1:00pm Lunch with undergraduate and graduate physics students (8-304) 3:20pm – 3:30pm Set-up presentation in lecture room (10-250) 3:30pm Cookie Social (4-349) 4:05pm Lecture (10-250) 6:00 – 9:00pm Pappalardo Lecture Dinner (Location tbd) Host: Your host is Professor Jesse Thaler (jthaler@MIT.EDU). Please let Prof. Thaler know if you have people in mind that you would like to see during your visit. For a list of MIT Physics faculty, please visit: http://web.mit.edu/physics/people/faculty/index.html Title and abstract: Please forward a short bio, headshot, talk title, and abstract by Monday, August 18, 2014. I have your affiliation listed as Harvard University; can you confirm that this is correct? Once I have this information, I can begin publicizing the event. I ask that you send me this information at your earliest convenience or by the date set forth above. Your colloquium will be publicized to the general community at MIT as well as through the Boston Area Physics Council. The content of your talk should be aimed at an advanced undergraduate level. Travel: Please let me know if I could be of assistance in arranging your travel to MIT’s campus. I can arrange for a parking spot near main campus if you need one. Audio-Visual: Please indicate any audio-visual items you will need for your talk. If you do not request items in advance, we cannot guarantee they will be available. The room is equipped with a LCD/CRT projector for laptop presentations, a wired microphone for the podium, a wireless lapel microphone, and a laser pointer. Please alert me to any additional needs. Filming: There is a possibility that we will be filming your talk to post onto an MIT website. If you have any questions regarding this, please let me know. Reimbursement: After your trip, please forward all itemized receipts either by PDF or snail mail to my address below. If you have any questions regarding your visit or need further information, please do not hesitate to contact me. Thank you, again, for agreeing to speak. We look forward to your talk. Regards, Nina Nina Wu I Events and Development Coordinator MIT, Department of Physics - 4-304 77 Massachusetts Avenue Cambridge, MA 02134 Tel: 617.253.6259 I Fax: 617.253.8554_Oh, an empty article!_ You can get started by DOUBLE CLICKING this text block and begin editing. You can also click the INSERT button below to add new block elements. Or you can DRAG AND DROP AN IMAGE right onto this text!