this is for holding javascript data
Jacob Hummel edited 2-Memory.tex
about 8 years ago
Commit id: 02311ad64adab3d30e3748e3349baccf6abe2aec
deletions | additions
diff --git a/2-Memory.tex b/2-Memory.tex
index 500726b..bcbc457 100644
--- a/2-Memory.tex
+++ b/2-Memory.tex
...
Enabling this requires intelligent memory management, loading only the particle data of interest from the disk.
Fortunately the HDF5 protocol is well-suited to such non-contiguous file access, allowing not only individual data fields to be accessed independently, but also for loading only select entries from the field in question.
\code{Gadfly}
allows employs two complementary approaches to minimizing the memory footprint.
The first method requires definition of
a an optional refinement criterion, such as particles above a given density threshold.
The resulting `refined' index can then be used to select only the corresponding values from
subsequently loaded subsequent particle
fields. fields as they are loaded.
While this method
is efficient, efficiently minimizes I/O operations, it is fairly rigid, and
attempting to load additional fields into a dataframe from which particles have been manually dropped will fail, as the particle indices will no longer match.
As such, this approach is poorly suited to exploratory analysis where the proper refinement criterion may not be know {\it{a
priori}}. As such, this approach priori}} and is best suited for use in scripts where the analysis to be performed is
defined beforehand.
The second approach, which well defined.
To mitigate the indexing issues that can
be used arise in
tandem with the refinement index, is designed to allow for incremental manual refinement of the situations where data
stored in memory.
Naiively attempting to load additional fields into a dataframe from which particles have been manually dropped will fail, as the particle indices will no longer match. \code{Gadfly} solves this by is loaded incrementally, \code{gadfly} performs an intermediate step, first loading new fields into a \code{pandas.Series} data structure, using the particle ID numbers as an index.
While use of a refinement criterion minimizes unnecessary I/O operations, this approach is fairly rigid. For example, and on its own would fail when
To mitigate this, \code{gadfly} performs an intermediate step
separately maintains a full list of This allows \code{pandas} to properly align the particle
IDs with which newly loaded fields are combined fields, dropping any particles not in
a
However, when additional fields are loaded into an existing the existing
data are dropped. \code{PartType} dataframe from the newly loaded field as it is appended.
This
allows for approach, which can be used in tandem with the
refinement index, affords \code{gadfly} the flexibility needed to allow incremental
manual refinement
along several axes of the data
kept stored in memory.
Additional cuts can be made as subsequent fields are loaded, resulting in the selection of a precisely targeted primary dataset from which derived properties (e.g., temperature) may be calculated, serving to reduce computational overhead as well.