Jacob Hummel edited Introduction.tex  about 8 years ago

Commit id: 75736e178488a1bb5d0478ca2e303d84e615bf7a

deletions | additions      

       

One of the biggest hurdles faced by researchers dealing with simulation data is figuring out how to analyze said data. This is in some ways an intrinsic problem for science; everyone is interested in a different aspect of a simulation’s output, and to save space, save only what they need, resulting in a different format every time. GADGET’s support for the HDF5 file protocol, however, provides at least a partial answer to this problem. If one can read an HDF5 file, they can perform exploratory analysis of that data without needing to know anything else about the file structure. The same cannot be said for GADGET’s alternative binary output format.   While use of the HDF5 data model provides a solid starting point, being able to read in a dataset is only the first step in performing useful, insight-generating analysis. Python is quickly becoming the language of choice for astronomers, and the analysis capabilities provided by the nascent pandas library will only strengthen that trend in the future. Pandas is a thoroughly documented, open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for python with a strong community of developers. With this in mind, we present a pandas-based framework for analyzing GADGET-HDF5 files, pyGadget. files: the GADGET dataframe library, or GADFLY.  This project is in no way expected to be a replacement for the far more feature-complete yt or pynbody projects. Rather, we focus instead on implementing the minimum functionality necessary to interface between simulation data in the GADGET HDF5 format, and the pandas data analysis library.