Authorea

Jacob Hummel edited Framework.tex about 8 years ago

Commit id: d1b3dd9a98967c978e68907f52be493a7d7b4e08

deletions | additions

There are several motivations for building an analysis framework around the \code{pandas.DataFrame}. Most important, \code{pandas} itself is a thoroughly documented, open-source, BSD-licensed library providing high-performance, easy-to-use data structures and analysis tools, and has a strong community of developers working to improve it. Secondly, as \code{pandas} is becoming the de-facto standard for data analysis in python, doing so simplifies interoperability with the rest of the tools provided by the broader scientific python ecosystem. doing so allows us to leverage existing capabilities of that library, and its strong developer community. Finally, using \code{pandas.DataFrame} as the primary data container rather than \code{numpy} arrays makes it much easier to keep different particle properties indexed correctly while still affording the flexibility to load and remove data from memory at will. \subsection{Data Organization} \subsection{Snapshot--DataFrame Interface} \subsection{Organizational Structure} \label{hierarchy} The framework makes some assumptions about how the simulation output is organized. Each particle type is expected to be contained in a different group, labeled PartType0, PartType1, etc. An additional group, Header, is also expected, containing metadata for the simulation snapshot as hdf5 attributes.