Burst Weights

\label{sec:burstweights_theory}

Theory

Freely-diffusing molecules across a Gaussian excitation volume give rise to a burst size distribution that is exponentially distributed. In a static FRET population, burst counts in the acceptor channel can be modeled as a binomial random variable (RV) with success probability equal to the population PR and number of trials equal to the burst size \(n_{d}+n_{a}\). Similarly, the PR of each burst \(E_{i}\) (\(i\) being the burst index) is simply a binomial divided by the number of trials, with variance reported in eq. \ref{eq:E_var}.

\begin{equation} \label{eq:E_var} \label{eq:E_var}\operatorname{Var}(E_{i})=\frac{E_{p}\,(1-E_{p})}{n_{ti}}\\ \end{equation}

Bursts with higher counts, provide more accurate estimations of the population PR, since their PR variance is smaller (eq. \ref{eq:E_var}). Therefore, in estimating the population PR we need to ”focus” on bigger bursts. Traditionally, this is accomplished by merely discarding bursts below a size-threshold. In the following paragraphs we demonstrate how, by proper weighting bursts, is possible to obtains optimal estimates of PR which takes into account the information of the entire burst population.

According to the Cramer-Rao lower bound (eq. \ref{eq:cramer_rao}), the Fisher information \(\mathcal{I}(\theta)\) sets a lower bound on the variance of any statistics \(\hat{p}\) of a RV \(\theta\).

\begin{equation} \label{eq:cramer_rao} \label{eq:cramer_rao}\operatorname{Var}\left(\hat{p}\right)\geq\frac{1}{\mathcal{I}(\theta)}\\ \end{equation}

When the statistics \(\hat{p}\) is an unbiased estimator of a distribution parameter and the equality holds in eq. \ref{eq:cramer_rao}, the estimator is a minimum-variance unbiased (MVUB) estimator and it is said to be efficient (meaning that it does an optimal use the information contained in the sample to estimate the parameter).

A population of \(N\) bursts can be modeled by a set of \(N\) binomial variables with same success probability \(E_{p}\) and varying number of successes equal to the burst size. An estimator for \(E_{p}\) can be constructed noticing that the sum of binomial RV with same success probability is still a binomial (with number of trials equal to the sum of the number of trials). Taking the sum of acceptor counts over all bursts divided by the total number of photons as in eq. \ref{eq:E_estim}, we obtain an estimator \(\hat{E}\) of the proportion of successes.

\begin{equation} \label{eq:E_estim} \label{eq:E_estim}\hat{E}=\frac{\sum_{i}n_{ai}}{\sum_{i}n_{ti}}\\ \end{equation}

The variance of \(\hat{E}\) (eq. \ref{eq:E_variance}) is equal to the inverse of the Fisher information \(\mathcal{I}(\hat{E})\) and therefore \(\hat{E}\) is a MVUB estimator for \(E_{p}\).

\begin{equation} \label{eq:E_variance} \label{eq:E_variance}\operatorname{Var}(\hat{E})=\frac{E_{p}(1-E_{p})}{\sum_{i}n_{ti}}=\frac{1}{\mathcal{I}(\hat{E})}\\ \end{equation}

We can finally verify that \(\hat{E}\) is equal to the weighted average of the bursts PR \(E_{i}\) (eq. \ref{eq:E_wmean}), with weights proportional to the burst size (eq. \ref{eq:weights}).

\begin{equation} \label{eq:weights} \label{eq:weights}w_{i}=\frac{n_{ti}}{\sum_{i}n_{ti}}\\ \end{equation}
\begin{equation} \label{eq:E_wmean} \label{eq:E_wmean}\hat{E}_{w}=\frac{1}{N}\sum_{i}w_{i}E_{i}=\frac{1}{N}\frac{\sum_{i}n_{ti}\frac{n_{ai}}{n_{ti}}}{\sum_{i}n_{ti}}=\hat{E}\\ \end{equation}

Since \(\hat{E}\) is the MVUB estimator, any other estimator of \(E_{p}\) (in particular the unweighted mean of \(E_{i}\)) will have a larger variance.

We can extend these consideration of optimal weights for the PR estimator to the FRET distribution plot (histograms or KDEs). Building an unweighted histogram (and fitting the peak) is analogous to estimating the \(E_{p}\) with an unweighted average. Conversely, building the FRET histogram using the burst size as weights is equivalent to using the MVUB estimator for \(E_{p}\).

Weighted FRET estimator

Here we report a simple verification of the results of previous section, namely that a weighted mean of \(E_{i}\) is the estimator with minimal variance of \(E_{p}\). For this purpose, we generated a static FRET population of 100 bursts by simply extracting burst-sizes from an exponential distribution (\(\lambda=10\)) and acceptor counts from a binomial distribution (\(E_{p}=0.2\)). By repeatedly fitting the population parameter \(E_{p}\) using a size-weighted and unweighted average, we verified that the former has systematically lower variance of the latter as predicted by the theory (in the current example the unweighted estimator has \(28.6\,\%\) higher variance). Note that this result holds for any arbitrary distribution of burst sizes. The full simulation including exponential and gamma-distributed burst sizes is reported in the accompanying Jupyter notebook (link).

Weighted FRET histogram

The effect of weighting the FRET histogram is here illustrated with a simulation of a mixture of two static FRET populations and then with experimental data.

We performed a realistic simulation of a static mixture of two FRET populations starting from 3-D Brownian motion diffusion of \(N\) particles excited by a numerically computed (non-Gaussian) PSF. Input parameters of the simulation include diffusion coefficient, particle brightness, the two FRET efficiencies, as well as detectors DCR. The simulation is performed with the open source software PyBroMo \cite{Ingargiola_2016} which creates smFRET data files (i.e. timestamps and detectors arrays) in Photon-HDF5 format \cite{Ingargiola2016}. The simulated data file is processed with FRETBursts performing burst search, and only a minimal burst size selection of with threshold of 10 photons. The resulting weighted and unweighted FRET histograms are reported in figure \ref{fig:weight_fret_sim}. We notice that the use of the weights results in better definition of FRET peaks.

As a final comparison, we report the weighted and unweighted FRET histogram of an experimental FRET population from measurement of a di-labeled dsDNA sample. Figure \ref{fig:weight_fret_meas} show a comparison of a FRET histogram obtained from the same burst with and without weights. The burst selection is obtained applying a burst size threshold of 10 counts (after background correction), in order to filter the extreme low-end of the burst size distribution.

The use of size-weighted FRET histograms is a simple way to obtain a representation of FRET distribution that maintains high power of resolving FRET peaks while including the full burst population and thus reducing statistical noise.

As a final remark, note that when increasing the size-threshold for burst selection the difference between weighted and unweighted FRET histograms tends to zero because the relative difference in burst weights in the selected burst becomes smaller (i.e. tends to 1, meaning equal weights).