Fisher Information Analysis

\label{sec:FI}

When fitting a model to observed data, one is evaluating the likelihood of the observed data, \(\{y\}\), conditioned on a hypothesis, typically given in the form of a parametric model \(f(\{a\})\). In this scenario, one is asking the question, “What is the probability of each of my data given the set of parameters \(\{a\}\)?” In addition to seeking the parameters that maximize the likelihood of observing the data, one is often also interested in the sensitivity of the data to the model parameters, with the aim of placing confidence intervals on the “best-fitting” parameters. In this case one is asking, “What is the sensitivity of my data to small changes in the model parameters?”

The Fisher information formalism provides a means of addressing this question. The diagonal elements of the Fisher information matrix encode the variance of each parameter, and the off-diagonal elements give the covariances of the parameters. The magnitudes variances and covariances are a function of both the nature of the model and the uncertainties in the data.

In the special case that the observed data are normally, identically and independently distributed about the model, the Fisher information matrix is simply the inverse of the covariance matrix that is often a byproduct of a traditional least-squares analysis. For \(N\) data points \(\{ y \}\) and model points \(\{ y_{\rm mod} \}\), we define the metric

\[\chi^2 = \sum\limits_{k=1}^{N} \sum\limits_{l=1}^{N} \left( y_k - y_{k,\text{mod}} \right) \mathcal{B}_{kl} \left( y_l - y_{l,\text{mod}} \right),\]

where \(\mathcal{B}_{kl} = \delta_{kl} \sigma^{-2}\) since we assume uncorrelated errors; under this assumption,

\[\chi^2 = \sigma^{-2} \sum\limits_{k=1}^{N} \left(y_k - y_{k,\text{mod}} \right)^2.\]

Furthermore, since we assume the errors are normally distributed as \(\mathcal{N}(0, \sigma^2)\), the likelihood of the data is \(\mathcal{L} = \exp{\left[ -\chi^2 / 2 \right]}\) (see \citet{Gould2003}). By \citet{Vallisneri2008}, the Fisher information matrix \(\mathbf{B}\) is defined by

\[\begin{aligned} B_{ij} & = & \langle \left( \frac{\partial}{\partial p_i} \log{\mathcal{L}} \right) \left( \frac{\partial}{\partial p_j} \log{\mathcal{L}} \right) \rangle \\ & = & \langle \left( \sigma^{-4} \sum\limits_{k=1}^N \sum\limits_{l=1}^N \left( y_k - y_{k,\text{mod}} \right) \left( y_l - y_{l,\text{mod}} \right) \frac{\partial y_{k,\text{mod}}}{\partial p_i} \frac{\partial y_{l,\text{mod}}}{\partial p_j} \right) \rangle \\ & = & \sigma^{-2} \sum\limits_{k=1}^N \left( \frac{\partial y_{k,\text{mod}}}{\partial p_i} \right) \left( \frac{\partial y_{k,\text{mod}}}{\partial p_j} \right) \\\end{aligned}\]

where \(p\) is the set of parameters \(p = \{t_c, \tau, T, \delta, f_0\}\). Equivalently, using the linearity of the expectation value and letting the model function \(y_{mod} = F\),

\[B_{ij} = \sigma^{-2} \sum\limits_{k=1}^{N} \left[ \frac{\partial}{\partial p_i} F(t_k~;~\{p\}) \right] \left[ \frac{\partial}{\partial p_j} F(t_k~;~\{p\}) \right] \label{eqn:FisherElementSum}\]

where \(F\) is one of \(F_{lb1}\) and \(F_{lb2}\) as appropriate for the regime considered. Tables \ref{tab:DerivativesF1} and \ref{tab:DerivativesF2} give partial derivatives for the five regions of the binned lightcurve model for \(F_{lb1}\) and \(F_{lb2}\), respectively.

We assume that the data points are sampled uniformly with a uniform sampling rate \(\Gamma\), beginning at time point \(t_0\) and for a total duration \(T_{tot}\). Like C08, we approximate the finite sums of Equation \ref{eqn:FisherElementSum} by an integral over time, assuming that \(\Gamma\) is large enough to sufficiently sample the transit curve:

\[B_{ij} = \frac{\Gamma}{\sigma^2} \int\limits_{t_0}^{t_0+T_{tot}} \left[ \frac{\partial}{\partial p_i} F(t~;~\{p\}) \right] \left[ \frac{\partial}{\partial p_j} F(t~;~\{p\}) \right] dt. \label{eqn:FisherElementIntegral}\]

For phase-folded data spanning several transits, we assume \(T_{tot} = P_{orb}\). We can also define an “effective” sampling rate, \(\Gamma_{eff}\), which can be at most \(N \Gamma\), where \(N\) is the number of transits observed. More rigorously, we can define \(\Gamma_{eff}\) as the reciprocal of the average time between consecutive phase-folded time points (see Section \ref{sec:PhaseSampling} for a discussion of the effects of phase sampling on \(\Gamma_{eff}\)). In either case, \(\Gamma_{eff}\) can be expressed independently of the exposure time \(t_{exp}\). Equation \ref{eqn:FisherElementIntegral} then becomes

\[B_{ij} = \frac{\Gamma_{eff}}{\sigma^2} \int\limits_{t_0}^{t_0+P_{orb}} \left[ \frac{\partial}{\partial p_i} F(t~;~\{p\}) \right] \left[ \frac{\partial}{\partial p_j} F(t~;~\{p\}) \right] dt. \label{eqn:FisherElementIntegralFolded}\]

Evaluating Equation \ref{eqn:FisherElementIntegral} with the partial derivatives given yields the Fisher matrices in Equations \ref{eqn:FisherMatrix1} and \ref{eqn:FisherMatrix2}. The full covariance matrix for each model is found by taking the matrix inverse of the Fisher matrix (see Equations \ref{eqn:cov1} and \ref{eqn:cov2}).