Jacob Hummel edited Introduction.tex  about 8 years ago

Commit id: e55d4e2801b8995c6b2852886e477cf258107cc7

deletions | additions      

       

Fortunately, the issue of data management and analysis is not endemic to astronomy, and the resulting overlap with the needs of the broader scientific community and the industrial community at large provides a large pool of scientific software developers to tackle these common problems.  In recent years, this broader community has settled on Python as its programming language of choice due to its efficacy as a 'glue' language and the rapid speed of development it allows.   This has led to the development of a robust scientific software ecosystem with packages for numerical data analysis like NumPy (Oliphant 2006; Van Der Walt et al. 2011), SciPy (Jones et al. 2001), pandas (McKinney 2010),and scikit-image; Matplotlib (Hunter 2007), and seaborn for plotting; scikit-learn for machine learning, and statistics and modeling packages like scikits-statsmodels, pymc, and emcee \citep{Foreman-Mackeyetal2013}.  Python is quickly becoming the language of choice for astronomers as well, with the Astropy project \citep{Robitailleetal2013} and its affiliated packages providing a coordinated set of python tools implementing the core astronomy-specific functionality needed by researchers. Additionally, the development of flexible Python packages like yt \citep{Turketal2011} and pynbody \citep{Pontzenetal2013} for analysis and visualization of astrophysical simulation data  and the analysis capabilities provided by the nascent pandas library will only strengthen that trend in the future. Pandas is a thoroughly documented, open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for python with a strong community of developers. With this in mind, we present a pandas-based framework for analyzing GADGET-HDF5 files: the GADGET dataframe library, or GADFLY.