trajectory spawning

Shang gave his talk today about the trajectory sampling method that Eric Vanden Eijden came up with. One problem he had was that a uniform distribution on the central sphere leads to very poor coverage of space because the trajectories diverge very fast in some regions and bunch up in other regions. This section will be about the idea of adaptively spawning new trajectories in regions where the trajectories are diverging.

note: \(\vec{b}(x)\) is a vector field which is aligned with the gradient at \(x\). I will use \(b\) instead of the gradient because I don’t fully understand the relationship between them. \(b\) has units length / time. The gradient has units energy / length. At this point I think they are just related by a constant.

First, just to orient ourselves, here are some equations from Shangs’s talk and Eric’s notes. \[V_{basin} = V_{ref} + F(\rho) \label{eqn:basin_volume_from_traj}\] where \(V_{ref}\) is the volume of the spherical reference surface enclosing the minimum. When averaging over points generated uniformly in the volume \(F(\rho ) = S_{ref} \left< \rho^{-1} \right>_{basin}^{-1}\). For our purposes, when generating points uniformly on the surface, we will use \[F(\rho) = \int_{surface} \rho (x) d \sigma (x) = S_{ref} < \rho >_{surface}\] This is just \(\rho\) integrated over the surface. This high dimensional integral will be done numerically (probably by averaging).

\[\rho(x) = \left| \vec{n}(x) \cdot \vec{b}(x) \right| \int J(x, t) dt\]

Where \(\vec{n}(x)\) is the normal to the surface at \(x\). Time \(t\) is a measure of the progress along the trajectory. At \(t=0\) the trajectory is at the surface and it heads to the basin boundary for large \(t\).

The Jacobian is updated according to \[J(x, t+dt) = J(x,t) \left[ 1 + \vec{\nabla} \cdot \vec{b}(X(t, x)) dt \right]\] where \(X(t, x)\) is the position along the trajectory starting from \(x\) at time \(t\). Note that \(\vec{\nabla} \cdot \vec{b}\) is the divergence of \(\vec{b}\) (written in Eric’s notes as \(Tr \vec{\nabla} \vec{b}\)). This is related to the trace of the Hessian \(H\) (they are simply proportional if \(\vec{b}\) is proportional to the gradient). \(J(x, t=0)\) is simply set equal to 1.

For a trajectory starting from \(x\) a small surface element of area \(d\sigma\) (normal to the trajectory) sweeps out a volume, \(|\vec{b}(x)| d\sigma \int J(x, t) dt\) (should check this). If \(ds = |\vec{b}(X(x,t))| dt\) has units length and measures the distance along the trajectory, then \[\chi(x,t) = J(x,t) \frac{\left|\vec{b}(x)\right|}{\left|\vec{b}(X(x,t))\right|}\] is the factor by which the surface element has increased or decreased by time \(t\). (I’m not totally sure about this)

The idea was to spawn new trajectories when the density of trajectories becomes very low. This can be implemented by spawning new trajectories when \(\chi\) is larger than a factor \(\alpha > 1\). At this point the local density of trajectories has decreased by a factor of \(\alpha\).

At time \(t\) into trajectory starting from \(x_i\), we can spawn \(N\) new trajectories from a surface of size \(d\sigma_i \chi(x_i, t)\) that is perpendicular to the trajectory. If we want to keep the density constant we can choose \(N \sim \alpha^{-1}\). The starting points for these trajectories are chosen somehow (randomly) from he surface. At this point trajectory \(i\) dies and is replace by \(N\) new ones. Those propagate forward and potentially spawn new trajectories.

This is describing a tree structure. Each trajectory has the potential to spawn children and the children can spawn new children. There is never any interaction between trajectories, everything is computed from local quantities, so the computation is still fairly trivially parallelizable.

How do we reconstruct the total volume from all these spawning events? To do so, we use a local variant of eq. \ref{eqn:basin_volume_from_traj}. If trajectory \(i\) spawns children at time \(t\) then the surface element will have transformed to be size \(\sigma_i^{end} = \sigma_i^{start} \chi(x_i, t)\). From this surface we spawn new trajectories. The total downstream volume will be \[V_i^{DS} = \int_{\sigma_{i}^{end}} \rho(x) d \sigma(x) = S_{i}^{end} \left< \rho(x) \right>_{x \in \sigma_{i}^{end}}\]

The recursive equation for \(\rho\) becomes \[\rho(x) = \frac{\left| \vec{n}(x) \cdot \vec{b}(x) \right|}{\left| \vec{b}(x) \right|} \left[ \left| \vec{b}(x) \right| \int_0^{t_{end}} J(x, t) dt + \chi(x, t_{end}) \left< \rho(y) \right>_{y \in \sigma_{i}^{end}} \right]\] In the above equation \(\sigma_i^{start}\) does not appear because \(\rho\) only computes the factor by which the surface increases, not the actual number. That will reappear when we get back to the reference volume (the central sphere)

trajectory spawning as message passing

This can be re-formulated in the language of the message passing algorithm in which there are two types of nodes. The spawn surfaces are factor nodes, and the trajectories are variable nodes (we’ll call them trajectory nodes). The graph is bipartite, in that trajectory node are only connected to factor nodes and visa versa. Again, since we are only using local information, the graph will be a tree. The nodes are created recursively by sweeping outwards from the center of the basin. The volume is computed by passing messages from outside in.

The messages from trajectory nodes to factor nodes will simply be the \(\rho\) for that trajectory. \[\mu_{t \to f} = \frac{\left| \vec{n}(x) \cdot \vec{b}(x) \right|}{\left| \vec{b}(x) \right|} \left[ \left| \vec{b}(x) \right| \int_0^{t_{end}} J(x, t) dt + \chi(x, t_{end}) \left< \rho(y) \right>_{y \in \sigma_{i}^{end}} \right]\] At a factor node (spawn surface) you combine all the incoming messages to estimate the relative downstream volume (you need to multiply by a surface element to make it an actual volume). \[\mu_{f \to t} = \frac{1}{\sigma} \int_{\sigma} \rho(x) d \sigma(x) = \left< \rho(x) \right>_{x \in \sigma} = \left< \mu_{t \to f}(x) \right>_{x \in \sigma}\] This message is then passed upstream to the trajectory which initiated the spawning event. This allows us to re-write the \(\mu_{t \to f}\) in terms of the single incoming message \[\mu_{t \to f} = \frac{\left| \vec{n}(x) \cdot \vec{b}(x) \right|}{\left| \vec{b}(x) \right|} \left[ \left| \vec{b}(x) \right| \int_0^{t_{end}} J(x, t) dt + \chi(x, t_{end}) \mu_{f \to t} \right]\]

The leaf nodes will be the trajectories that encounter the basin boundary before spawning. The incomming messages for those nodes \(\mu_{f \to t}\) can be taken to be zero.

note: At first I had the projection part, that includes the term \(\vec{n}(x(0))\) as part of the factor node. It seemed logical because it is evaluated at the spawning surface (which is the factor). But I think computationally it will be simpler this way because the messages that are passed are simply numbers and all the computation involving the potential and high dimensional vectors can be done at runtime during the trajectory.

Note: For all spawned surfaces we should have \(\hat{n}\) parallel to \(\vec{b}\). This may not be actually true if we assume a flat surface and the divergence of \(\vec{b}\) is large. We should think about whether we actually want to evaluate this dot product. The numerics might be most accurate if we just assume they are parallel.

note on units: The end result should not depend on energy, so all units with energy should cancel. This means that \(\left| \vec{n}(x(0)) \cdot \vec{\nabla} U(x(0)) \right|\) should actually be \(\left| \vec{n}(x(0)) \cdot \vec{\nabla} U(x(0)) \right| / \left| \vec{\nabla} U(x(0)) \right| \). Additionally \(Tr(H) dt\) must be unitless. That means that either \(dt\) has units length\(^2\) / energy, or there is a term missing.

velocity field interpretation

The way Eric has it set up \(\vec{b}\) is a velocity field, and the volume elements throughout the basin flow along the velocity field toward the central reference surface. You can think about this as the flow of some material that is initially at constant density throughout the volume. Then you measure the total amount of material that flows through the surface. The amount of material that flows through a small surface element \(d\sigma(x)\) is \(d \vec{\sigma}(x) \cdot \vec{b} \int_0^{\infty} J(x,t) dt\). Since the velocity field is never zero outside the reference surface (ignoring the basin boundary) the time for all the material to flow is always finite. The maximum time (the practical upper bound on the integral) is determined by the magnitude of the velocity. Thus changing \(\vec{b}\) by a constant factor just rescales the time axis. So if \(\vec{b} = -\vec{\nabla} U\) then time really does have units length\(^2\) / energy. If you double the energy, then you should half your step in time keep the steps in space constant.

Time can be thought of as parameterizing the distance along a path (trajectory). Thus, since the path is defined by following the vector field \(\vec{b}\), we can always replace \(\int dt\) with \(\int_{path} dx |b(x)|^{-1}\). For me, this is a more natural way of thinking about it and more closely resembles what will happen on the computer. Additionally it explicitly cancels the irrelevent units of energy. Also, when evaluating these integrals numerically we are free to break them up however we want. That is we are free to choose the time step (or step in space) however we want. Probably we can use second derivative information (\(Tr(H) \sim \vec{\nabla} \cdot \vec{b}\)) to take small steps where things are changing most rapidly.

Since we have that \(d \vec{\sigma}(x) \cdot \vec{b}(x) \int_0^{\infty} J(x,t) dt\) is the total amount of material flowing through \(d \sigma(x)\). If we define \(d \vec{\sigma}\) to be the surface element normal (parallel?) to \(\vec{b}\) then \(d \sigma(x) b(x) J(x,t)\) dt is the amount of material flowing through \(d\sigma (x)\) at time \(t\) in time interval \(dt\). Replacing \(dt\) with \(ds / b(X(x,t))\) gives \(d \sigma(x) \chi(x,t) ds\) is the amount of material in the volume element \(d \sigma(x) ds\) at time \(t\). Or equivalently, \(\chi(x, t)\) is the density at time \(t\). It is unitless because it is the “density of volume”.