Authorea

Jacob Hummel edited 2-Memory.tex about 8 years ago

Commit id: 02311ad64adab3d30e3748e3349baccf6abe2aec

deletions | additions

Enabling this requires intelligent memory management, loading only the particle data of interest from the disk. Fortunately the HDF5 protocol is well-suited to such non-contiguous file access, allowing not only individual data fields to be accessed independently, but also for loading only select entries from the field in question. \code{Gadfly} allows employs two complementary approaches to minimizing the memory footprint. The first method requires definition of a an optional refinement criterion, such as particles above a given density threshold. The resulting `refined' index can then be used to select only the corresponding values from subsequently loaded subsequent particle fields. fields as they are loaded. While this method is efficient, efficiently minimizes I/O operations, it is fairly rigid, and attempting to load additional fields into a dataframe from which particles have been manually dropped will fail, as the particle indices will no longer match. As such, this approach is poorly suited to exploratory analysis where the proper refinement criterion may not be know {\it{a priori}}. As such, this approach priori}} and is best suited for use in scripts where the analysis to be performed is defined beforehand. The second approach, which well defined. To mitigate the indexing issues that can be used arise in tandem with the refinement index, is designed to allow for incremental manual refinement of the situations where data stored in memory. Naiively attempting to load additional fields into a dataframe from which particles have been manually dropped will fail, as the particle indices will no longer match. \code{Gadfly} solves this by is loaded incrementally, \code{gadfly} performs an intermediate step, first loading new fields into a \code{pandas.Series} data structure, using the particle ID numbers as an index. While use of a refinement criterion minimizes unnecessary I/O operations, this approach is fairly rigid. For example, and on its own would fail when To mitigate this, \code{gadfly} performs an intermediate step separately maintains a full list of This allows \code{pandas} to properly align the particle IDs with which newly loaded fields are combined fields, dropping any particles not in a However, when additional fields are loaded into an existing the existing data are dropped. \code{PartType} dataframe from the newly loaded field as it is appended. This allows for approach, which can be used in tandem with the refinement index, affords \code{gadfly} the flexibility needed to allow incremental manual refinementalong several axes of the data kept stored in memory. Additional cuts can be made as subsequent fields are loaded, resulting in the selection of a precisely targeted primary dataset from which derived properties (e.g., temperature) may be calculated, serving to reduce computational overhead as well.