adam greenberg deleted method.tex  about 10 years ago

Commit id: 5c0d1c375227d31b36c801249ba13c2a25dac933

deletions | additions      

       

abstract.tex  Introduction.tex  currentmethod.tex  methods.tex Future changes.tex  method.tex           

\section{Method}  The Square Root Information Filter (SRIF) was originally developed by Bierman in 1977 [ref]. The algorithm minimizes $\chi^2$ for time series data with Gaussian errors, and was based on the Kalman filter algorithm. SRIF is more stable, more accurate, and faster than the current algorithm used in \textbf{shape}. SRIF is also more numerically stable (and, in some cases faster) than a standard Gauss-Newton routine. My implmentation of SRIF includes some changes to the original algorithm, which will be discussed in section blah.blah. \par  The fundamental difference between SRIF algorithm and a classic Gauss-Newton routine is the use of matrix square roots and Householder operations for increased numerical stability.  \subsecton{Steepest Descent Routine}  A classical Gauss-Newton routine (GNR) minimizes the weighted residuals between a model and data with Gaussian noise by determining the direction in parameter space in which the $\chi^2$ is decreasing fastest. Specifically, suppose one has a set of $m$ observables, $\vec{z}$ with weights $W$, and a model function $\vec{m}(\vec{x})$, where \vec{x} is an $n$-dimensional parameter vector. Assuming independent data points with Gaussian-distributed errors, the probability of the model matching the data is given by \[p(\vec{m}(\vec{x})\ |\ \vec{z}) \propto p(\vec{z}\ |\ \vec{m}(\vec{x})) \propto \exp( -\frac{1}{2}\vec{R}^\intercal W \vec{R})\] where $\vec{R} = \vec{z} - \vec{m}(\vec{x})$ . Therefore maximizing the model probability is the same as minimizing the value \[\chi^2(\vec{x}) = \vec{R}^\intercal W \vec{R}\] Perturbing $\vec{x}$ by some amount, $\vec{\delta x}$, and minimizing $\chi^2(\vec{x})$ over $\vec{\delta x}$ yields \[(A^\intercal A)\vec{\delta x} = A^\intercal R\] where \[A = -\frac{\partial \vec{R}}{\partial \vec{x}}\] Thus, changing one's parameter vector by \[\vec{\delta x} = (A^\intercal A)^{-1} A^\intercal R\] will yield a decrease in $\chi^2(\vec{x})$ . For non-linear systems, this process is repeated multiple times until the change in $\chi^2$ from one iteration to the next has passed below a fiducial fraction. \par  A major issue with GNR is that one step involves taking the inverse of the matrix $(A^\intercal A)$ . This matrix has $m^2$ elements and thus can be quite large for a model with many parameters. Another problem is numerical stability -- $(A^\intercal A)$ may be ill-conditioned, and thus taking the inverse can result in numerical errors.   \subsection{Square Root Information Filter}  The Square Root Information Filter (SRIF) gets around the problems inherent in a classical GNR by utilizing matrix square roots and Householder operations to increase the numerical stability when determining $\delta\vec{x}$ . Instead of minimizing $\chi^2$, SRIF minimizes \[Q = (\chi^2)^{\frac{1}{2}} = ||W^{\frac{1}{2}} \vec{R}||\] Then, along similar lines as GNR, a change of $\vec{\delta x}$ is introduced to the parameter vector $\vec{x}$, and $Q' = Q(\vec{x}+\vec{\delta x})$ is minimized over this change. \par  $Q'$ is smallest when \[||W^{\frac{1}{2}} \vec{R}(\vec{x} + \vec{\delta x})|| \approx ||W^{\frac{1}{2}}\vec{R}(\vec{x}) + W^{\frac{1}{2}} A\vec{\delta x}||\] is minimized. \par  A matrix $H$ is defined such that $HW^{\frac{1}{2}} A$ is upper triangular. $H$ is orthogonal and can be generated by the product of $m$ Householder operation matrices. Note that the orthogonality of $H$ guarantees that \[|| HW^{\frac{1}{2}}\vec{R}(\vec{x}) + HW^{\frac{1}{2}} A\vec{\delta x} || \\ = ||H(W^{\frac{1}{2}}\vec{R}(\vec{x}) + W^{\frac{1}{2}} A\vec{\delta x})|| \\ = ||W^{\frac{1}{2}}\vec{R}(\vec{x}) + W^{\frac{1}{2}} A\vec{\delta x}||\]   Since $HW^{\frac{1}{2}} A$ is upper triangular, it can be rewritten as   \[ \left( \begin{array}{c}  A' \\  0 \\  \vdots \\  0 \end{array} \right)\]   where $A'$ is an $m \times m$ array . Then rewriting \[HW^{\frac{1}{2}}\vec{R}(\vec{x}) = \left( \begin{array}{}  \vec{R_x'} \\  \vec{R_z'} \end{array} \right)\] where $R_x'$ and $R_z'$ are $m \times 1$ and $z \times 1$ arrays, respectively, yields  \[ Q' = \left( \begin{array}{c}  \vec{R_x'} + A'\vec{\delta x} \\  \vec{R_z'} \end{array} \right)\]   This is clearly minimized when \[ \vec{R_x'} = -A'\vec{\delta x} \] or \[ \vec{\delta x} = -A'^{-1} \vec{R_x'}\]  Note that since $A'$ is upper triangular, its inverse can be easily calculated.   \subsection{Additions to SRIF}  I have implemented 2 main additions to the standard SRIF. An inherent downside to the SRIF is the number of operations necessary to generate the Householder matrix $H$ grows like $mn^2$. My implementation of the SRIF runs the matrix triangularization simultaneously on multiple cores, which results in a significant speed-up. This can be done in a thread-safe manner because each Householder operation occurs on a column-by-column basis. \par  The second addition I have made to the standard SRIF is the inclusion of a secondary $\chi^2$ minimization for the scaling of $\vec{\delta{x}}$. So   \[Q'' = Q(\vec{x} + \alpha \vec{\delta x})\] is minimized. This minimization is done with a grid search over 6 decades. The additional minimization adds a trivial additional computation cost to the overall minimization of $\chi^2$, but allows for faster convergence, and the possibility of skipping over local minima in the $\chi^2$-space.