Jacob Hummel edited 2-Memory.tex  about 8 years ago

Commit id: 02311ad64adab3d30e3748e3349baccf6abe2aec

deletions | additions      

       

Enabling this requires intelligent memory management, loading only the particle data of interest from the disk.  Fortunately the HDF5 protocol is well-suited to such non-contiguous file access, allowing not only individual data fields to be accessed independently, but also for loading only select entries from the field in question.  \code{Gadfly} allows employs  two complementary approaches to minimizing the memory footprint. The first method requires definition of a an optional  refinement criterion, such as particles above a given density threshold. The resulting `refined' index can then be used to select only the corresponding values from subsequently loaded subsequent  particle fields. fields as they are loaded.  While this method is efficient, efficiently minimizes I/O operations,  it is fairly rigid, and attempting to load additional fields into a dataframe from which particles have been manually dropped will fail, as the particle indices will no longer match.   As such, this approach is  poorly suited to exploratory analysis where the proper refinement criterion may not be know {\it{a priori}}. As such, this approach priori}} and  is best suited for use in scripts where the analysis to be performed is defined beforehand.  The second approach, which well defined.  To mitigate the indexing issues that  can be used arise  in tandem with the refinement index, is designed to allow for incremental manual refinement of the situations where  data stored in memory.   Naiively attempting to load additional fields into a dataframe from which particles have been manually dropped will fail, as the particle indices will no longer match. \code{Gadfly} solves this by is loaded incrementally, \code{gadfly} performs an intermediate step,  first loading new fields into a \code{pandas.Series} data structure, using the particle ID numbers as an index. While use of a refinement criterion minimizes unnecessary I/O operations, this approach is fairly rigid. For example, and on its own would fail when   To mitigate this, \code{gadfly} performs an intermediate step  separately maintains a full list of This allows \code{pandas} to properly align the  particle IDs with which newly loaded fields are combined fields, dropping any particles not  in a   However, when additional fields are loaded into an existing the  existing data are dropped. \code{PartType} dataframe from the newly loaded field as it is appended.  This allows for approach, which can be used in tandem with  the refinement index, affords \code{gadfly} the flexibility needed to allow  incremental manual  refinementalong several axes  of the data kept stored  in memory. Additional cuts can be made as subsequent fields are loaded, resulting in the selection of a precisely targeted primary dataset from which derived properties (e.g., temperature) may be calculated, serving to reduce computational overhead as well.