Authorea

adam greenberg edited method.tex about 10 years ago

Commit id: 8fc937b75e46b4657a9f0fa4931938c4bb056d56

deletions | additions

\subsection{Steepest Descent Routine} A classical Gauss-Newton routine (GNR) minimizes the weighted residuals between a model and data with Gaussian noise by determining the direction in parameter space in which the $\chi^2$ is decreasing fastest. Specifically, suppose one has a set of $m$ observables, $\vec{z}$ with weights $W$, and a model function $\vec{m}(\vec{x})$, where \vec{x} is an $n$-dimensional parameter vector. Assuming independent data points with Gaussian-distributed errors, the probability of the model matching the data is given by \[p(\vec{m}(\vec{x}) | \vec{z}) \propto p(\vec{z} | \vec{m}(\vec{x})) \propto \exp( -\frac{1}{2}\vec{R}^\intercal W \vec{R})\] where $\vec{R} = \vec{z} - \vec{m}(\vec{x})$ . Therefore maximizing the model probability is the same as minimizing the value \[\chi^2(\vec{x}) = \vec{R}^\intercal W \vec{R}\] Perturbing $\vec{x}$ by some amount, $\vec{\delta x}$, and minimizing $\chi^2(\vec{x})$ over $\vec{\delta x}$ yields \[(A^\intercal A)\vec{\deltax} A)\vec{\delta x} = A^\intercal R\] where \[A = -\frac{\partial \vec{R}}{\partial \vec{x}}]\ Thus, changing one's parameter vector by \[\vec{\delta x} = (A^\intercal A)^{-1} A^\intercal R\] will yield a decrease in $\chi^2(\vec{x})$ .\par A major issue with GNR is that one step involves taking the inverse of the matrix $(A^\intercal A)$ . This matrix has $m^2$ elements and thus can be quite large for a model with many parameters. Another problem is numerical stability -- $(A^\intercal A)$ may be ill-conditioned, and thus taking the inverse could result in numerical errors. \subsection{Square Root Information Filter} The Square Root Information Filter (SRIF) gets around the problems inherent in a classical GNR by utilizing matrix square roots and Householder operations to increase the numerical stability when determining $\delta\vec{x}$ . Instead of minimizing $\chi^2$, SRIF minimizes \[Q = (\chi^2)^{\frac{1}{2}} = ||W^{\frac{1}{2}} \vec{R}||\] Then, along similar lines as GNR, a change of $\vec{\delta x}$ is introduced to the parameter vector $\vec{x}$, and $Q' = Q(\vec{x}+\vec{\deltax})$ Q(\vec{x}+\vec{\delta x})$ is minimized over this change. \par $Q'$ is smallest when \[||(W^{\frac{1}{2}} \vec{R(\vec{x} + \vec{\delta x}}|| /approx ||W^{\frac{1}{2}} \vec{R} + W^{frac{1}{2}} A\vec{\delta x}||\] is minimized.