Jacob Hummel edited Introduction.tex  about 8 years ago

Commit id: a97e7a96f72694f41312cf1d99997d4659ee35bc

deletions | additions      

       

This state of affairs has historically forced significant duplication of effort, with individual research groups separately developing their own unique analysis scripts to perform similar operations.  Fortunately, the issue of data management and analysis is not endemic to astronomy, and the resulting overlap with the needs of the broader scientific community and the industrial community at large provides a large pool of scientific software developers to tackle these common problems.  In recent years, this broader community has settled on Python as its programming language of choice due to its efficacy as a 'glue' language and the rapid speed of development it allows. This has led to the development of a robust scientific software ecosystem with packages for numerical data analysis like NumPy (Oliphant 2006; Van Der Walt et al. 2011), SciPy (Jones et al. 2001), pandas (McKinney 2010),and scikit-image; Matplotlib and seaborn for plotting; scikit-learn for machine learning, and statistics and modeling packages like scikits-statsmodels, pymc, and emcee \citep{Foreman-Mackeyetal2013}. Python is quickly becoming the language of choice for astronomers as well, with the advent of the Astropy project \citep{Robitailleetal2013} and its affiliated packages  %Adoption of the platform-independent Hierarchical Data Format (HDF5) for data storage helps mitigate some of these issues, being able to load a dataset into memory is only the first step in performing useful, insight-generating analysis.   Python is quickly becoming the language of choice for astronomers, and the analysis capabilities provided by the nascent pandas library will only strengthen that trend in the future. Pandas is a thoroughly documented, open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for python with a strong community of developers. With this in mind, we present a pandas-based framework for analyzing GADGET-HDF5 files: the GADGET dataframe library, or GADFLY. This project is in no way expected to be a replacement for the far more feature-complete yt or pynbody projects. Rather, we focus instead on implementing the minimum functionality necessary to interface between simulation data in the GADGET HDF5 format, and the pandas data analysis library. library.%Adoption of the platform-independent Hierarchical Data Format (HDF5) for data storage helps mitigate some of these issues, being able to load a dataset into memory is only the first step in performing useful, insight-generating analysis.