Jacob Hummel edited Introduction.tex  about 8 years ago

Commit id: 3fa29115d0304e5c5af5b472c900f9d508f25df4

deletions | additions      

       

In the past decade, astrophysical simulations have increased dramatically in both size and sophistication, and the typical size of the datasets produced has grown accordingly.   However, the software tools for analyzing such datasets have not kept pace, such that one of the primary barriers to exploratory investigation is simply manipulating the data.   This problem is particularly acute for users of the popular  smoothed particle hydrodynamics (SPH) code GADGET \citep{SpringelYoshidaWhite2001,Springel2005}. GADGET is widely  used to investigate awide  range of astrophysical problems due to the ease with which it can be extended.   Unfortunately problems; unfortunately  this also leads to fractionation of the data storage format as each research group modifies the output to suit their needs. This state of affairs has historically forced significant duplication of effort, with individual research groups separately developing their own unique analysis scripts to perform similar operations.  Fortunately, the issue of data management and analysis is not endemic to astronomy, and the resulting overlap with the needs of the broader scientific community and the industrial community at large  Adoption of the platform-independent Hierarchical Data Format (HDF5) for data storage helps mitigate some of these issues, being able to load a dataset into memory is only the first step in performing useful, insight-generating analysis.   Python is quickly becoming the language of choice for astronomers, and the analysis capabilities provided by the nascent pandas library will only strengthen that trend in the future. Pandas is a thoroughly documented, open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for python with a strong community of developers. With this in mind, we present a pandas-based framework for analyzing GADGET-HDF5 files: the GADGET dataframe library, or GADFLY.