Authorea

Antonino Ingargiola Remove SI text, leave only a caption for each SI file. about 8 years ago

Commit id: e0d88daed526b40cea242884f7ca83f4c232f20b

deletions | additions

contains links to the various resources. Installation instructions can be found in the Reference Documentation (\href{http://fretbursts.readthedocs.org/en/latest/getting_started.html}{link}). A description of FRETBursts execution using Jupyter notebooks is reported in~\ref{sec:notebook}. % SI_link Detailed information on development style, testing strategies and contributions guidelines are reported in~\ref{sec:dev}. % SI_link Finally, to facilitate evaluation and comparison with other software, we set up an on-line services allowing to execute FRETBursts without requiring any installation on the user's computer (\href{https://github.com/tritemio/FRETBursts_notebooks#run-online}{link}).

so that the bursts with largest sizes will have the largest weights. Using size as weights (instead of any other monotonically increasing function of size) can be justified noticing that the variance of bursts proximity ratio (PR) is inversely proportional to the burst size (see~\ref{sec:burstweights_theory} for details). % SI_link In general, a weighting scheme is used for building efficient estimators for a population parameter (e.g. the population FRET efficiency $E_p$).

\textit{Background estimation} section of the μs-ALEX tutorial (\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/FRETBursts%20-%20us-ALEX%20smFRET%20burst%20analysis.ipynb#Background-estimation}{link}). Finally, it is possible to use a slower but rigorous approach for finding the optimal threshold as described in~\ref{sec:bg_opt_th}. % SI_link FRETBursts provides two kinds of plots to represent the background. One shows the histograms of inter-photon delays compared to the fitted exponential distribution, shown in

\end{lstlisting} This command reflects the general form of plotting commands in FRETBursts as described in~\ref{sec:plotting}. % SI_link Here we only note that the argument \verb|period| is an integer specifying the background period to be plotted (when omitted, the default is 0, i.e. the first period). Figure~\ref{fig:bg_dist_all} allows to quickly identify pathological cases where the

\section{Conclusions} \label{sec:conclusions} FRETBursts is an open source and openly developed (see~\ref{sec:dev}) implementation % SI_link of established smFRET burst analysis methods made available to the single-molecule community. It implements several novel concepts which improve the analysis results, such as

\beginsupplement \section{Supporting Information} \subsection{Notebook Workflow} \paragraph*{S1 Appendix.} \label{sec:notebook} {\bf Notebook Workflow.} A description of the notebook workflow used by FRETBursts. FRETBursts has been developed with the goal of facilitating computational reproducibility of the performed data analysis~\cite{Buckheit_1995}. For this reason, the preferential way of using FRETBursts is by executing one of the tutorials which are in the form of Jupyter notebooks~\cite{Shen_2014}. Jupyter (formerly IPython) notebooks are web-based documents which contain both code and rich text (including equations, hyperlinks, figures, etc...). FRETBursts tutorials are notebooks which can be re-executed, modified or used to process new data files with minimal modifications. The ``notebook workflow''~\cite{Shen_2014} not only facilitates the description of the analysis (by integrating the code in a rich document) but also greatly enhances its reproducibility by storing an execution trail that includes software versions, input files, parameters, commands and all the analysis results (text, figures, tables, etc.). The Jupyter Notebook environment streamlines FRETBursts execution (compared to a traditional script and terminal based approach) and allows FRETBursts to be used even without prior python knowledge. The user only needs to get familiar with the notebook graphical environment, in order to be able to navigate and run the notebooks. A list of all FRETBursts notebooks can be found in the \verb|FRETBursts_notebooks| repository on GitHub (\href{https://github.com/tritemio/FRETBursts_notebooks}{link}). Finally, we provide a service to run FRETBursts notebooks online, without requiring any software installation (\href{https://github.com/tritemio/FRETBursts_notebooks#run-online}{link}). \subsection{Development and Contributions} \paragraph*{S2 Appendix.} \label{sec:dev} Errors are an inevitable reality in any reasonably complex software~\cite{Merali_2010,Soergel_2015}. It is therefore critical to implement countermeasures to minimize the probability of introducing bugs and their potential impact~\cite{Prli__2012, Wilson_2014}. We strove to follow modern software development best-practices, which are summarized below. FRETBursts (and the entire python ecosystem it depends on) is open source {\bf Development and the source code is fully available for any scientist to study, review and modify. The open source nature Contributions.} A description of FRETBursts development philosophy and of the python ecosystem, not only makes it a more transparent, reviewable platform for scientific data analysis, but also allows to leverage state-of-the-art online services such techniques asGitHub (\href{http://https://github.com}{link}) for hosting, issues tracking and code reviews, TravisCI (\href{https://travis-ci.org}{link}) and AppVeyor (\href{http://www.appveyor.com/}{link}) for continuous integration (i.e. automated test suite execution on multiple platforms after each commit) and \href{https://readthedocs.org/}{ReadTheDocs.org} for automatic documentation building and hosting. All these services would be extremely costly, if available at all, for a proprietary software or platform~\cite{Freeman_2015}. We highly value source code readability, a property which can reduce the number of bugs by facilitating understanding and verifying the code. For this purpose, FRETBursts code-base is wellcommented (with comments representing over 35\% of the source code), follows the PEP8 python code style rules (\href{https://www.python.org/dev/peps/pep-0008/}{link}), and has docstrings in napoleon format (\href{http://sphinxcontrib-napoleon.readthedocs.org/}{link}). Reference documentation is built with Sphinx (\href{http://sphinx-doc.org/}{sphinx-doc.org}) and all API documents are automatically generated from docstrings. On each commit, documentation is automatically built and deployed on \href{https://readthedocs.org/}{ReadTheDocs.org}. Unit tests cover most of the core algorithms, ensuring consistency and minimizing the probability of introducing bugs. The continuous integration services, execute the full test suite on each commit on multiple platforms, immediately reporting errors. As a rule, whenever a bug is discovered, the fix also includes a new test to ensure that the same bug does not happen in the future. In addition to the unit tests, we include a regression-test notebook (\href{https://github.com/tritemio/FRETBursts/blob/master/notebooks/dev/tests/FRETBursts%20-%20Regression%20tests.ipynb}{link}) to easily compares numerical results between two versions of FRETBursts. Additionally, the tutorials themselves are executed before each release as an additional test layer to ensure that no errors or regressions are introduced. FRETBursts is openly developed using the GitHub platform. The authors encourage users to use GitHub issues for questions, discussions and bug reports, and how to submit patches through GitHub pull requests. Contributors of any level of expertise are welcome in the projects and publicly acknowledged. Contributions can be as simple as pointing out deficiencies in the documentation but can also be bug reports or corrections contribute to thedocumentation or code. Users willing to implement new features are encouraged to open an Issue on GitHub and to submit a Pull Request. The open source nature of FRETBursts guarantees that contributions will become available to the entire single-molecule community. \subsection{Timestamps and Burst Data} \label{sec:burststimes} project. Beyond providing prepackaged functions for established methods, FRETBursts also provides the infrastructure for exploring new analysis approaches. Users can easily get timestamps (or selection masks) for any photon stream. Core burst data (start and stop times, indexes and derived quantities for each burst) are stored in \verb|Bursts| objects (\href{http://fretbursts.readthedocs.org/en/latest/burstsearch.html}{link}). This object provides a simple and well-tested interface (100 \% unit-test coverage) to access and manipulate burst data. \verb|Bursts| are created from a sequence \paragraph*{S3 Appendix.} \label{sec:burststimes} {\bf Timestamps and Burst Data.} General concepts of start/stop times and indexes, while all other fields are automatically computed. \verb|Bursts|'s methods allow to recompute indexes relative to a different photon selection or recompute start and stop times relative to a new how timestampsarray. Additional methods perform fusion of nearby bursts or combination of two set of bursts (time intersection or union). This functionality is used for example to implement the DCBS. In conclusion, \verb|Bursts| efficiently implements all the common operations performed with burst data, providing and easy-to-use interface and well tested algorithms. Leveraging \verb|Bursts| methods, users can implement new types of analysis without wasting time implementing (and debugging) standard manipulation routines. Examples of working directly with timestamps, masks (i.e. photon selections) and burst bursts data are provided stored and handled in one of the FRETBursts notebooks (\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/Example%20-%20Working%20with%20timestamps%20and%20bursts.ipynb}{link}). Section~\nameref{sec:bva} provides a complete example on using FRETBursts to implement custom burst analysis techniques. FRETBursts. \paragraph{Python details} Timestamps are stored in the \verb|Data| attribute \verb|ph_times_m|, which is a list or arrays, one array per excitation spot. In single-spot measurements the full timestamps array is accessed as \verb|Data.ph_times_m[0]|. To get timestamps of arbitrary photon streams, users can call \verb|Data.get_ph_times| (\href{http://fretbursts.readthedocs.org/en/latest/data_class.html?highlight=get_ph_times#fretbursts.burstlib.Data.get_ph_times}{link}). Photon streams are selected from the full (all-photons) timestamps array using boolean masks, which can be obtained calling \verb|Data.get_ph_mask| (\href{http://fretbursts.readthedocs.org/en/latest/data_class.html?highlight=get_ph_mask#fretbursts.burstlib.Data.get_ph_mask}{link}). All burst data (e.g. start-stop times and indexes, burst duration, etc.) are stored in \verb|Bursts| objects. For uniformity, the bursts start-stop indexes are always referring to the all-photons timestamps array, regardless of the photon stream used for burst search. \verb|Bursts| objects internally store only start and stop times and indexes. The other \verb|Bursts| attributes (duration, photon counts, etc.) are computed on-the-fly when requested (using class properties), thus minimizing the object state. \verb|Bursts| support iteration with performances similar to iterating through rows of 2D row-major numpy arrays. \subsection{Plotting \texttt{Data}} \paragraph*{S4 Appendix.} \label{sec:plotting} {\bf Plotting \texttt{Data}.} A description of the syntax used to perform plots in FRETBursts. FRETBursts uses matplotlib~\cite{matplotlib} and seaborn~\cite{seaborn} to provide a wide range of built-in plot functions (\href{http://fretbursts.readthedocs.org/en/latest/plots.html}{link}) for \verb|Data| objects. The plot syntax is the same for both single and multi-spot measurements. The majority of plot commands are called through the wrapper function \verb|dplot|, for example to plot a timetrace of the photon data, type: \begin{lstlisting} dplot(d, timetrace) \end{lstlisting} The function \verb|dplot| is the generic plot function, which creates figure and handles details common to all the plotting functions (for instance, the title). \verb|d| is the \verb|Data| variable and \verb|timetrace| is the actual plot function, which operates on a single channel. In multispot measurements, \verb|dplot| creates one subplot for each spot and calls \verb|timetrace| for each channel. All built-in plot functions which can be passed to \verb|dplot| are defined in the \verb|burst_plot| module (\href{http://fretbursts.readthedocs.org/en/latest/plots.html}{link}). \paragraph{Python details} When FRETBursts is imported, all plot functions are also imported. To facilitate finding the plot functions through auto-completion, their names start with a standard prefix indicating the plot type. The prefixes are: \verb|timetrace| for binned timetraces of photon data, \verb|ratetrace| for rates of photons as a function of time (non binnned), \verb|hist| for functions plotting histograms and \verb|scatter| for scatter plots. Additional plots can be easily created directly with matplotlib. By default, in order to speed-up batch processing, FRETBursts notebooks display plots as static images using the \textit{inline} matplotlib backend. User can switch to interactive figures inside the browser by activating the interactive backend with the command \verb|%matplotlib notebook|. Another option is displaying figures in a new standalone window using a desktop graphical library such as QT4. In this case, the command to be used is \verb|%matplotlib qt|. A few plot functions, such as \verb|timetrace| and \verb|hist2d_alex|, have interactive features which require the QT4 backend. As an example, after switching to the QT4 backend the following command: \begin{lstlisting} dplot(d, timetrace, scroll=True, bursts=True) \end{lstlisting} \noindent will open a new window with a timetrace plot with overlay of bursts, and an horizontal scroll-bar for quick "scrolling" throughout time. The user can click on a burst to have the corresponding burst info be printed in the notebook. Similarly, calling the \verb|hist2d_alex| function with the QT4 backend allows selecting an area on the E-S histogram using the mouse. \begin{lstlisting} dplot(ds, hist2d_alex, gui_sel=True) \end{lstlisting} The values which identify the region are printed in the notebook and can be passed to the function \verb|select_bursts.ES| to select bursts inside that region (see section~\nameref{sec:burstsel}). \subsection{Background Estimation With Optimal Threshold} \paragraph*{S5 Appendix.} \label{sec:bg_opt_th} The functions used to fit the background (i.e. \verb|bg.exp_fit| and other functions in \verb|bg| module) provide also a goodness-of-fit estimator computed from the empirical distribution function (EDF)~\cite{Stephens1974,Parr1980}. The ``distance'' between the EDF and the theoretical (i.e. exponential) cumulative distribution represents and indicator of the quality of fit. Two different distance metrics can be returned by the background fitting functions. The first is the Kolgomorov-Smirnov statistics, which uses the maximum {\bf Background Estimation With Optimal Threshold.} A description of the difference between the EDF and the theoretical distribution. The second is the Cramér von Mises statistics corresponding to the integral of the squared residuals (see the code for more details, \href{https://github.com/tritemio/FRETBursts/blob/master/fretbursts/background.py#L43}{link}). In principle, the optimal inter-photon delay threshold will minimize the error metric. This approach is implemented algorithm used by the function \verb|calc_bg_brute| (\href{http://fretbursts.readthedocs.org/en/latest/plugins.html#fretbursts.burstlib_ext.calc_bg_brute}{link}) which performs a brute-force search in order FRETBursts to find compute the optimalthreshold. This optimization is not necessary under typical experimental conditions, because the estimated rates normally change only a by a few per-cent compared to the heuristic threshold selection used by default. \subsection{Burst Weights} \label{sec:burstweights_theory} \subsubsection{Theory} Freely-diffusing molecules across a Gaussian excitation volume give rise to a burst size distribution that is exponentially distributed. In a static FRET population, burst counts in the acceptor channel can be modeled as a binomial random variable (RV) with success probability equal to the population PR and number of trials equal to the burst size $n_d + n_a$. Similarly, the PR of each burst $E_i$ ($i$ being the burst index) is simply a binomial divided by the number of trials, with variance reported in eq.~\ref{eq:E_var}. for background estimation. \begin{equation} \label{eq:E_var} \operatorname{Var} (E_i) = \frac{E_p\,(1 - E_p)}{n_{ti}} \end{equation} Bursts with higher counts, provide more accurate estimations of the population PR, since their PR variance is smaller (eq.~\ref{eq:E_var}). Therefore, in estimating the population PR we need to "focus" on bigger bursts. Traditionally, this is accomplished by merely discarding bursts below a size-threshold. In the following paragraphs we demonstrate how, by proper weighting bursts, is possible to obtains optimal estimates of PR which takes into account the information of the entire burst population. According to the Cramer-Rao lower bound (eq.~\ref{eq:cramer_rao}), the Fisher information $\mathcal{I}(\theta)$ sets a lower bound on the variance of any statistics $\hat{p}$ of a RV $\theta$. \begin{equation} \label{eq:cramer_rao} \operatorname{Var}\left(\hat{p}\right) \ge \frac{1}{\mathcal{I}(\theta)} \end{equation} When the statistics $\hat{p}$ is an unbiased estimator of a distribution parameter and the equality holds in eq.~\ref{eq:cramer_rao}, the estimator is a minimum-variance unbiased (MVUB) estimator and it is said to be efficient (meaning that it does an optimal use the information contained in the sample to estimate the parameter). A population of $N$ bursts can be modeled by a set of $N$ binomial variables with same success probability $E_p$ and varying number of successes equal to the burst size. An estimator for $E_p$ can be constructed noticing that the sum of binomial RV with same success probability is still a binomial (with number of trials equal to the sum of the number of trials). Taking the sum of acceptor counts over all bursts divided by the total number of photons as in eq.~\ref{eq:E_estim}, we obtain an estimator $\hat{E}$ of the proportion of successes. \begin{equation} \label{eq:E_estim} \hat{E} = \frac{\sum_i n_{ai}}{\sum_i n_{ti}} \end{equation} The variance of $\hat{E}$ (eq.~\ref{eq:E_variance}) is equal to \paragraph*{S6 Appendix.} \label{sec:burstweights_theory} {\bf Burst Weights.} Theory underpinning the inverse choice ofthe Fisher information $\mathcal{I}(\hat{E})$ and therefore $\hat{E}$ is a MVUB estimator for $E_p$. \begin{equation} \label{eq:E_variance} \operatorname{Var}(\hat{E}) = \frac{E_p (1 - E_p)}{\sum_i n_{ti}} = \frac{1}{\mathcal{I}(\hat{E})} \end{equation} We can finally verify that $\hat{E}$ is equal to the weighted average of the bursts PR $E_i$ (eq.~\ref{eq:E_wmean}), with weights proportional to the burst size (eq.~\ref{eq:weights}). \begin{equation} \label{eq:weights} w_i = \frac{n_{ti}}{\sum_i n_{ti}} \end{equation} \begin{equation} \label{eq:E_wmean} \hat{E}_w = \frac{1}{N} \sum_i w_i E_i = \frac{1}{N} \frac{\sum_i n_{ti} \frac{n_{ai}}{n_{ti}} }{\sum_i n_{ti}} = \hat{E} \end{equation} Since $\hat{E}$ is the MVUB estimator, any other estimator of $E_p$ (in particular the unweighted mean of $E_i$) will have a larger variance. We can extend these consideration of optimal weights for the PR estimator to the FRET distribution plot (histograms or KDEs). Building an unweighted histogram (and fitting the peak) is analogous to estimating the $E_p$ with an unweighted average. Conversely, building the FRET histogram usingthe burst size as weightsis equivalent to using the MVUB estimator for$E_p$. \subsubsection{Weighted FRET estimator} Here we report a simple verification of the results of previous section, namely that a weighted mean of $E_i$ is the estimator with minimal variance of $E_p$. For this purpose, we generated a static FRET population of 100 bursts by simply extracting burst-sizes from an exponential distribution ($\lambda = 10$) and acceptor counts from a binomial distribution ($E_p = 0.2$). By repeatedly fitting the population parameter $E_p$ using a size-weighted and unweighted average, we verified that the former has systematically lower variance of the latter as predicted by the theory (in the current example the unweighted estimator has $28.6\,\%$ higher variance). Note that this result holds for any arbitrary distribution of burst sizes. The full simulation including exponential and gamma-distributed burst sizes is reported in the accompanying Jupyter notebook (\href{http://nbviewer.jupyter.org/github/tritemio/fretbursts_paper/blob/master/notebooks/Figures%20-%20Burst%20Weights.ipynb}{link}). \subsubsection{Weighted FRET histogram} The effect of weighting the FRET histogram is here illustrated with a simulation of a mixture of two static FRET populations and then with experimental data. We performed a realistic simulation of a static mixture of two FRET populations starting from 3-D Brownian motion diffusion of $N$ particles excited by a numerically computed (non-Gaussian) PSF. Input parameters of the simulation include diffusion coefficient, particle brightness, the two FRET efficiencies, as well as detectors DCR. The simulation is performed with the open source software PyBroMo~\cite{Ingargiola_2016} which creates smFRET data files (i.e. timestamps and detectors arrays) in Photon-HDF5 format~\cite{Ingargiola2016}. The simulated data file is processed with FRETBursts performing burst search, and only a minimal burst size selection of with threshold of 10 photons. The resulting weighted and unweighted FRET histograms are reported in figure~\ref{fig:weight_fret_sim}. We notice that the use of the weights results in better definition of FRET peaks. As a final comparison, we report the weighted and unweighted FRET histogram of an experimental FRET population from measurement of a di-labeled dsDNA sample. Figure~\ref{fig:weight_fret_meas} show a comparison of a FRET histogram obtained from the same burst with and without weights. The burst selection is obtained applying a burst size threshold of 10 counts (after background correction), in order to filter the extreme low-end of the burst size distribution. The use of size-weighted FRET histograms is a simple way to obtain a representation of FRET distribution that maintains high power of resolving FRET peaks while including the full burst population and thus reducing statistical noise. As a final remark, note that when increasing the size-threshold for burst selection the difference between weighted and unweighted FRET histograms tends to zero because the relative difference in burst weights in the selected burst becomes smaller (i.e. tends to 1, meaning equal weights). \begin{figure}[h!] \begin{center} \includegraphics[width=0.49\columnwidth]{figures/weight_fret_hist_sim_mixture/weight_fret_hist_sim_mixture} \caption{\label{fig:weight_fret_sim} Comparison of unweighted and size-weighted FRET histograms for a simulated mixtures of static FRET populations. In both cases bursts are selected with a size threshold of 10 photons (after background correction).% } \end{center} \end{figure} \begin{figure}[h!] \begin{center} \includegraphics[width=0.49\columnwidth]{figures/weight_fret_hist_measurement/weight_fret_hist_measurement} \caption{\label{fig:weight_fret_meas} Comparison of unweighted and size-weighted FRET histograms for a smFRET measurement of a static FRET sample (di-labeled dsDNA). In both cases bursts are selected with a size threshold of 10 photons (after background correction).% } \end{center} \end{figure} estimation. \nolinenumbers \bibliography{bibliography/converted_to_latex.bib%