Authorea

adam greenberg edited methods.tex about 10 years ago

Commit id: 05547608ec2f6a4e0320083d6771f57136307c92

deletions | additions

The fundamental difference between SRIF algorithm and a classic Gauss-Newton routine is the use of matrix square roots and Householder operations for increased numerical stability. \subsecton{Steepest Descent Routine} A classical Gauss-Newton routine (GNR) minimizes the weighted residuals between a model and data with Gaussian noise by determining the direction in parameter space in which the $\chi^2$ is decreasing fastest. Specifically, suppose one has a set of $m$ $n$ observables, $\vec{z}$ with weights $W$, and a model function $\vec{m}(\vec{x})$, where \vec{x} is an $n$-dimensional $m$-dimensional parameter vector. Assuming independent data points with Gaussian-distributed errors, the probability of the model matching the data is given by \[p(\vec{m}(\vec{x})\ |\ \vec{z}) \propto p(\vec{z}\ |\ \vec{m}(\vec{x})) \propto \exp( -\frac{1}{2}\vec{R}^\intercal W \vec{R})\] where $\vec{R} = \vec{z} - \vec{m}(\vec{x})$ . Therefore maximizing the model probability is the same as minimizing the value \[\chi^2(\vec{x}) = \vec{R}^\intercal W \vec{R}\] Perturbing $\vec{x}$ by some amount, $\vec{\delta x}$, and minimizing $\chi^2(\vec{x})$ over $\vec{\delta x}$ yields \[(A^\intercal A)\vec{\delta x} = A^\intercal R\] where \[A = -\frac{\partial \vec{R}}{\partial \vec{x}}\] Thus, changing one's parameter vector by \[\vec{\delta x} = (A^\intercal A)^{-1} A^\intercal R\] will yield a decrease in $\chi^2(\vec{x})$ . For non-linear systems, this process is repeated multiple times until the change in $\chi^2$ from one iteration to the next has passed below a fiducial fraction. \par A major issue with GNR is that one step involves taking the inverse of the matrix $(A^\intercal A)$ . This matrix has $m^2$ elements and thus can be quite large for a model with many parameters. Another problem is numerical stability -- $(A^\intercal A)$ may be ill-conditioned, and thus taking the inverse can result in numerical errors. \subsection{Square Root Information Filter}

0 \end{array} \right)\] where $A'$ is an $m \times m$ array . Then rewriting \[HW^{\frac{1}{2}}\vec{R}(\vec{x}) = \left( \begin{array}{} \vec{R_x'} \\ \vec{R_z'} \end{array} \right)\] where $R_x'$ and $R_z'$ are $m \times 1$ and $z $n \times 1$ arrays, respectively, yields \[ Q' = \left( \begin{array}{c} \vec{R_x'} + A'\vec{\delta x} \\ \vec{R_z'} \end{array} \right)\]