Main Data History
Show Index Toggle 0 comments
  •  Quick Edit
  • MPSLIB: A C++ class for sequential simulation of multiple-point statistical models

    Thomas Mejer Hansen1, Le Thanh Vu2, Torben Bach3

    This is the postprint of the following manuscript, to be published in Sowftware X in 2016:
    Hansen, T.M., Vu, L.T., and Bach, Torben, 2016. MPSLIB: A C++ class for sequential simulation of multiple-point statistical models. SoftwareX. doi:10.1016/j.softx.2016.07.001.

    1. Niels Bohr Instiute, University of Copenhagen, Denmark

    2. I-GIS, Aarhus, Denmark

    3. I-GIS, Aarhus, Denmark


    Geostatistical simulation methods allow simulation of spatial structures and patterns based on a choice of statistical model. In the last few decades multiple-point statistics (MPS) has been developed that allow inferring the statistical model from a training image. This allows for a simpler quantification of the statistical model, and simulation of more realistic (Earth) structures. A number of different algorithms for MPS based simulation have been proposed, each associated with a unique set of pros or cons. MPSLIB is a C++ class that provides a framework for implementing most of the currently proposed multiple-point simulation methods based on sequential simulation. A number of the most widely used methods are provided as an example. The single normal equation simulation (SNESIM) method is implemented using both a tree and a list structure. A new generalized ENESIM (GENESIM) algorithm is proposed that can act as (in one extreme) the ENESIM algorithm, and (in another extreme) similar to the direct sampling algorithm. MPSLIB aims to be easy to compile on most platforms (standard C++11 is the only requirement) and is released under the Open Source LGPLv3 License to encourage reuse and further development.

    Motivation and significance

    \label{sec:introduction} Geostatistics is type of statistics with a focus on describing geo-spatial (Earth) structures in a probabilistic framework through a probability distribution. A ‘geostatistical model’ refer to a selection of a probability distribution to reflect a specific Earth model. Geostatistical models are typically used to characterize subsurface reservoir models used for groundwater, hydrocarbon or heat storage, for both exploration, exploitation, and management. A geostatistical model describes infinitely many single Earth models, consistent with the chosen probability function. The variability of these Earth models reflect the uncertainty related to the choice of statistical model. Such uncertainty quantification is the key advantage of using geostatistical models, as opposed to considering only one, in some sense, optimal model.

    Geostatistical modeling is a two-step process. First a statistical model must be selected or described, typically through a probability distribution \(f({\bf m})\). Once a model has been established, actual realizations from the probability distribution is generated using ‘simulation algorithms’, which is the topic of this manuscript.

    Traditionally, geostatistical simulation algorithms are based on Gaussian statistics, typically based on statistics only between pairs of model parameters, and hence referred to as 2-point statistics. These methods has been, and are still, widely used (Deutsch 1998). However, 2-point based statistical models are limited with respect to the spatial structures they can represent. With the introduction of multiple-point statistical (MPS) models, more geologically realistic spatial structures can be quantified (Guardiano 1993, Strebelle 2002). This has led to the development of a number of simulation algorithms for multiple-point simulation, e.g. (Guardiano 1993, Strebelle 2002, Straubhaar 2011, Mariethoz 2010). For MPS based statistical models, there is usually no closed form analytical expression of \(f({\bf m})\). Instead, a ‘training image’ or a ‘sample model’ is assumed available from which multiple-point statistical events can be inferred. The goal of MPS methods is to allow sampling from the unknown \(f({\bf m})\) such that realizations are consistent with the statistics that can be inferred from the training image. For an overview of MPS based simulation algorithms see (Mariethoz 2014).

    Many of the proposed MPS algorithms are implemented in various forms. However, some of these codes are either not maintained (Remy 2002, Remy 2009), not available on multiple platforms (Remy 2009), not available under an open source license (Ar2Tech 2015, Ephesia Consult 2015, Ephesia Consult 2015a), or does only implement one specific type of MPS algorithm (Strebelle 2015).

    Here a C++ library, MPSLIB, designed specifically for multiple-point simulation is presented, that is released under the GPLv3 license. MPSLIB implements the core functionalities needed to implement any multiple-point simulation algorithm, based on sequential simulation (Alabert 1989). Note that this does not include methods based on pattern matching (Wu 2008) and image-synthesis (Mariethoz 2014a). Implementations of the ‘extended normal equations simulation’ (ENESIM) (Guardiano 1993) and the ‘single normal equation simulation’ (SNESIM) (Strebelle 2002, Straubhaar 2011) algorithms are provided as examples. Further, a new algorithm GENESIM, based on the ENESIM algorithm, is proposed that can act similar to both the ENESIM and the direct sampling algorithms (Mariethoz 2010).

    Software description

    \label{sec:description} MPSLIB is written in C++ (Stroustrup 1986) using c++11 (ISO 2011). It consist of a C++ class that provides the framework for applying sequential simulation to sample from multiple-point statistical models. It also contains a number of algorithms implemented using MPSLIB.

    At the core of the library is an implementation of sequential simulation, that can be briefly described as follows: Say a probability distribution describes the relation between \(M\) model parameters through \(f({\bf m})\)=\(f(m_1, m_2, ..., m_M)\) that are typically ordered by some nodes on a grid. Then a realization of \(f({\bf m})\) can be generated using sequential simulation as follows:

    1. Visit a random node, say \(m_1\), and generate a realization of \(f({m_1})\) as \(m_1^*\).

    2. Move to another node, say \(m_2\), and generate a realization of \(f({m_2 | m_1^*})\) as \(m_2^*\).

    3. Move to another node, say \(m_3\), and generate a realization of \(f({m_3 | m_1^*, m_2^*})\) as \(m_3^*\).

    4. ...

    5. Move to the last node, \(m_M\), and generate a realization of \(f({m_M | m_1^*, m_2^*, ..., m_{M-1}^*})\) as \(m_M^*\).

    This will generate a realization of \({\bf m^*} = [m_1^* , m_2^* ,..., m_M^* ,]\) from \(f(\bf m)\). See more details in e.g. (Gomez-Hernandez 1993).