Abstract

I present the progress that has been made towards implementing a Square Root Information Filter (SRIF) into the asteroid-modelling software **shape**. I compare SRIF’s performance with the current fitting algorithm, with the conclusion that SRIF has the potential to perform substantially better than its predecessor. I show the results of SRIF operating on previously collected delay-doppler data for the asteroid 2000 ET70. I also discuss potential future changes to improve shape’s fitting speed and accuracy.

Radar is a powerful tool for gathering information about bodies in the Solar System. Radar allows for determination of orbital elements of both planets and minor bodies (such as asteroids and comets) to high precision. A relatively new use for radar has been computing asteroid shapes based on their radar signature. This problem is not trivial – without information on the spin axis of the asteroid, the underlying shape of the body cannot be uniquely determined from its radar image. Furthermore, even if spin information were known, certain peculiarities of the radar images such as the North South ambiguity make recovery of the physical shape extremely difficult. The best method to accomplish this is by fitting a model for the physical shape to the observed radar image. Unfortnuately, this is a computationally intensive problem, potentially involving hundreds to thousands of free parameters, and millions of data points.

Given the computational intensity, one might raise the question as to the worth of computing shape information from radar data. The most compelling reason to do so is the fact that radar is not diffraction limited, since it does not require spatial resolution of the object. This means that radar can be used to resolve objects substantially smaller than the beamwidth of the detector used to obtain the images. For example, Arecibo, the primary instrument used for the data presented in this paper, has a beamwidth of more than an arcminute. Yet Arecibo can easily gather shape information to an accuracy of decameters for objects which subtend 20 milliarcseconds.

Radar has other advantages as well. Unlike most observational techniques inside the Solar System, radar does not rely on reflected sunlight. This human-controlled illumination allows for greater flexibility with respect to the observations. Radar also has the ability to probe an object’s sub-surface properties, which can give important information about the object such as porosity, surface-ice content, and surface conductivity.

There are also various reasons why asteroid shape data are so important. An asteroid’s orbital future cannot be accurately determined without information on its shape. The Yarkovsky effect, for example, can drastically change an asteroid’s orbit over several periods. This effect occurs because the rotating body absorbs sunlight and then re-emits that light in a non-radial direction. This results in an impulse which perturbs the asteroid’s orbit. The Yarkovsky effect is greatly dependent on the shape of the object, since re-emmission of absorbed sunlight is a surface phenomenon. Orbital determination of asteroids must take the Yarkovksy effect into account, and this is of special importance for determining impact probabilities for near Earth asteroids (http://adsabs.harvard.edu/abs/2013DPS....4510608C). In distinguishing between rubble piles and contact binaries, asteroid shapes also help to test hypotheses about orbital histories, as well as assess statistical properties of asteroid populations. Shapes also effect how two-body interactions between asteroids will develop – a binary with a non-spherically symmetric primary will evolve differently than its spherical counterpart.

Asteroid shapes are currently modeled using the **shape** software package. **shape** takes a model for the asteroid, which is based on both shape and spin parameters, and projects that model into radar space. This projection is compared to observed data for the asteroid, and changes are made to the model parameters in an attempt to minimize the residuals. **shape** uses increasing model complexity to build up a representation for the asteroid, from a basic ellipsoid model to capture gross features, to a spherical harmonic model which can represent finer surface elements (See section ? (results)).

**shape** currently uses a Sequential Parameter Fit (SPF) mechanism to adjust the model following a comparison between the projected radar image and the asteroid data. The SPF iterates through each parameter of the model and minimizes the \(\chi^2\) for that individual parameter using a brent and bracket method [reference: numerical recipes]. This process is not only slow, but it also does not guarantee convergence on a local maximum. My work has been to replace the SPF currently implemented in **shape** with a modified Square Root Information Filter, as outlined in section ?.

The Square Root Information Filter (SRIF) was originally developed by Bierman in 1977 [ref]. The algorithm minimizes \(\chi^2\) for time series data with Gaussian errors, and was based on the Kalman filter algorithm. SRIF is more stable, more accurate, and faster than the current algorithm used in **shape**. SRIF is also more numerically stable (and, in some cases faster) than a standard Gauss-Newton routine. My implmentation of SRIF includes some changes to the original algorithm, which will be discussed in section ?.

The fundamental difference between SRIF algorithm and a classic Gauss-Newton routine is the use of matrix square roots and Householder operations for increased numerical stability.

A classical Gauss-Newton routine (GNR) minimizes the weighted residuals between a model and data with Gaussian noise by determining the direction in parameter space in which the \(\chi^2\) is decreasing fastest. Specifically, suppose one has a set of \(n\) observables, \(\vec{z}\) with weights \(W\), and a model function \(\vec{m}(\vec{x})\), where \(\vec{x}\) is an \(m\)-dimensional parameter vector. Assuming independent data points with Gaussian-distributed errors, the probability of the model matching the data is given by \[p(\vec{m}(\vec{x})\ |\ \vec{z}) \propto p(\vec{z}\ |\ \vec{m}(\vec{x})) \propto \exp( -\frac{1}{2}\vec{R}^\intercal W \vec{R})\] where \(\vec{R} = \vec{z} - \vec{m}(\vec{x})\) . Therefore maximizing the model probability is the same as minimizing the value \[\chi^2(\vec{x}) = \vec{R}^\intercal W \vec{R}\] Perturbing \(\vec{x}\) by some amount, \(\vec{\delta x}\), and minimizing \(\chi^2(\vec{x})\) over \(\vec{\delta x}\) yields \[(A^\intercal A)\vec{\delta x} = A^\intercal R\] where \[A = \frac{\partial \vec{R}}{\partial \vec{x}}\] Thus, changing one’s parameter vector by \[\vec{\delta x} = (A^\intercal A)^{-1} A^\intercal R\] will yield a decrease in \(\chi^2(\vec{x})\) . For non-linear systems, this process is repeated multiple times until the change in \(\chi^2\) from one iteration to the next has passed below a fiducial fraction.

A major issue with GNR is that one step involves taking the inverse of the matrix \((A^\intercal A)\) . This matrix has \(m^2\) elements and thus can be quite large for a model with many parameters. Another problem is numerical stability – \((A^\intercal A)\) may be ill-conditioned, and thus taking the inverse can result in numerical errors.

The Square Root Information Filter (SRIF) gets around the problems inherent in a classical GNR by utilizing matrix square roots and Householder operations to increase the numerical stability when determining \(\delta\vec{x}\) . Instead of minimizing \(\chi^2\), SRIF minimizes \[Q = (\chi^2)^{\frac{1}{2}} = ||W^{\frac{1}{2}} \vec{R}||\] Then, along similar lines as GNR, a change of \(\vec{\delta x}\) is introduced to the parameter vector \(\vec{x}\), and \(Q' = Q(\vec{x}+\vec{\delta x})\) is minimized over this change.

\(Q'\) is smallest when \[||W^{\frac{1}{2}} \vec{R}(\vec{x} + \vec{\delta x})|| \approx ||W^{\frac{1}{2}}(\vec{R}(\vec{x}) + A\vec{\delta x})|| = ||W^{\frac{1}{2}}\vec{R}(\vec{x}) + W^{\frac{1}{2}} A\vec{\delta x}||\] is minimized.

A matrix \(H\) is defined such that \(HW^{\frac{1}{2}} A\) is upper triangular. \(H\) is orthogonal and can be generated by the product of \(m\) Householder operation matrices. Note that the orthogonality of \(H\) guarantees that \[|| HW^{\frac{1}{2}}\vec{R}(\vec{x}) + HW^{\frac{1}{2}} A\vec{\delta x} || = ||H(W^{\frac{1}{2}}\vec{R}(\vec{x}) + W^{\frac{1}{2}} A\vec{\delta x})|| = ||W^{\frac{1}{2}}\vec{R}(\vec{x}) + W^{\frac{1}{2}} A\vec{\delta x}||\] Since \(HW^{\frac{1}{2}} A\) is upper triangular, it can be rewritten as \[\left( \begin{array}{c} A' \\ 0 \\ \vdots \\ 0 \end{array} \right)\] where \(A'\) is an \(m \times m\) array . Then rewriting \[HW^{\frac{1}{2}}\vec{R}(\vec{x}) = \left( \begin{array}{c} \vec{R_x'} \\ \vec{R_z'} \end{array} \right)\] where \(R_x'\) and \(R_z'\) are \(m \times 1\) and \(n \times 1\) arrays, respectively, yields \[Q' = ||\left( \begin{array}{c} \vec{R_x'} + A'\vec{\delta x} \\ \vec{R_z'} \end{array} \right)||\] This is clearly minimized when \[\vec{R_x'} = -A'\vec{\delta x}\] or \[\vec{\delta x} = -A'^{-1} \vec{R_x'}\] Note that since \(A'\) is upper triangular, its inverse can be easily calculated.

I have implemented 2 main additions to the standard SRIF. An inherent downside to the SRIF is the number of operations necessary to generate the Householder matrix \(H\) grows like \(m^2(n-m)\). My implementation of the SRIF runs the matrix triangularization simultaneously on multiple cores, which results in a significant speed-up. This can be done in a thread-safe manner because each Householder operation functions on a column-by-column basis.

The second addition I have made to the standard SRIF is the inclusion of a secondary \(\chi^2\) minimization for the scaling of \(\vec{\delta{x}}\). So \[Q'' = Q(\vec{x} + \alpha \vec{\delta x})\] is minimized. This minimization is done with a grid search over 6 decades. The additional minimization adds a trivial additional computation cost to the overall minimization of \(\chi^2\), but allows for faster convergence, and the possibility of skipping over local minima in the \(\chi^2\)-space.

The SPF routine can currently fit models to data while taking into account a suite of “penalty functions”, to ensure that the models that are generated are physical. In a way, these penalty functions serve to make the fits operate in a more global context – there may be a local minimum in \(\chi^2\)-space which the fitting algorithm would want to tend towards, but that minimum can be ruled out *a priori* thanks to physical limitations. These penalty functions include limits on ellipsoid axis ratios, constraints to surface concavities, limits on the model center of mass distance from the image center of mass, as well as others. I have implemented these penalty functions in the SRIF framework. Specifically, I have done so by redefining the residual vector as \[\vec{R}'' = \left( \begin{array}{c} \vec{z} - \vec{m} \\ \vec{p_w} \end{array} \right)\] Where \[\vec{p_w} = \left( \begin{array}{c}
p_1\times w_1 \\
\vdots \\
p_N\times w_N \end{array} \right)\] for \(\{p_i\}\) , \(\{w_i\}\) are the set of penalty functions and penalty weights, respectively, and \[A'' = -\frac{\partial \vec{R''}}{\partial \vec{x}}\] The algorithm then progresses as described in section ?, with \(\vec{R}\) replaced with \(\vec{R}''\) and \(A''\) replacing \(A\).

Figure ? shows comparisons of SPF vs SRIF attempting to fit simulated data to a given model. Note that for certain starting conditions, SPF outperforms SRIF for a large fraction of the total fitting time. SRIF pays an initial overhead cost due to the need to generate a new derivative matrix per iteration, as well as to perform the Householder operations. These overheads mean that each SRIF iteration is substantially longer than each SPF iteration. However, if the fit takes long enough, SRIF tends to converge before SPF (if SPF converges at all). Tables ? and ? show overall run statistics for the tests plotted in figure ?. Note that SRIF performed an order of magnitude faster than SPF for the one test that SPF converged.

*if new shape has a better convergence (see comments)*: I have run **shape** with the new fitting routine on the asteroid 2000 ET70. ET70 has had its shape fit in the past, using **shape** and the SPF fitting routine (reference shantanu). **shape** with SRIF has converged on a lower \(\chi^2\). as indicated in figure ?

*if new shape has a worse convergence*: I have run **shape** with the new fitting routine on the asteroid 2000 ET70. **shape** was run initially with an ellipsoid model. The starting conditions for the this model were such that the ellipsoid axes were all equal (Figure ?a). The best fit ellipsoid model (Figure ?b) was then converted into a spherical harmonics model with 200 model components. This spherical harmonic model was then fit as well (Figure ?c). The initial ellipsoid model fit resulted in a \(\chi^2\) drop of \(34\%\). The spherical harmonic fit resulted in a \(\chi^2\) drop of \(13\%\). Figure ? (not yet generated) indicates the surface gravity field for ET70 given the final shape. Table ? (not yet generated) gives the moments of inertia for ET70 given the final shape.

(the above was a clunky paragraph)

Table 1

Model | \(\frac{\chi_{initial}^2}{\chi_{final}^2}\) | Runtime (sec) | Correct Parameters Found? |

Ellipsoid #1 | 2.5 | 255.0 | No |

Ellipsoid #2 | 5.83 | 334.0 | No |

Ellipsoid #3 | 1.16 | 4.0 | No |

Sph. Harm. #1 | 50.7 | 27264.0 | Yes |

Sph. Harm. #2 | 8.34 | 4819.0 | No |

Sph. Harm. #2 | 58.29 | 45163.0 | No |

Run statistics for SPF test fits, using simulated data. These test fits correspond to the plots in Figure ?

Table 2

Model | \(\frac{\chi_{initial}^2}{\chi_{final}^2}\) | Runtime (sec) | Correct Parameters Found? |

Ellipsoid #1 | 63.9 | 174.0 | Yes |

Ellipsoid #2 | 90.0 | 969.0 | Yes |

Ellipsoid #3 | 75.3 | 737.0 | Yes |

Sph. Harm. #1 | 67.1 | 2297.0 | Yes |

Sph. Harm. #2 | 65.0 | 3595.0 | Yes |

Sph. Harm. #3 | 81.9 | 7206.0 | Yes |

Run statistics for SPF test fits, using simulated data. These test fits correspond to the plots in Figure ?

The addition of SRIF to **shape** has improved fitting performance, but additional changes must still be made to allow **shape** to function with real-world data.

The data that are fitted with **shape** are typically split into sets, with each set corresponding to a different observing run (for the same object). Correspondingly, the models that are fit to these data have both global and local variables, meaning there are parameters that are tied matrix to the set, and parameters that are tied to the object itself. For example, how far off the object’s center of mass is from the predicted center of mass is date-specific. Intelligent grouping of local and global parameters can potentially lead to a drastic computation time decrease.

Tests that I have run with the simulated data have indicated that the \(\chi^2\)-space for shape-models is not smooth. This implies that a global fitting mechanism, such as simulated annealing or Monte Carlo Markov chain, may be better suited for this problem. These methods can be supplemented by a gradient descent method like SRIF. Utilizing a hybrid of these methods may prove to be the optimal solution for this class of problem.

Appendix explaining householder operations?

Abdul Hameedalmost 2 years ago · Public