this is for holding javascript data
Antonino Ingargiola Add diff files
almost 8 years ago
Commit id: ee446b5c7314979ed3b96dd77cc9ba78e17af759
deletions | additions
diff --git a/FRETBursts authorea latex/diff_928_161.tex b/FRETBursts authorea latex/diff_928_161.tex
new file mode 100644
index 0000000..6f90176
--- /dev/null
+++ b/FRETBursts authorea latex/diff_928_161.tex
...
% Template for PLoS
%DIF LATEXDIFF DIFFERENCE FILE
%DIF DEL full_article_928.tex Tue Jun 28 13:25:24 2016
%DIF ADD full_article_161.tex Thu Jun 30 12:52:03 2016
% Version 3.1 February 2015
%
% To compile to pdf, run:
% latex plos.template
% bibtex plos.template
% latex plos.template
% latex plos.template
% dvipdf plos.template
%
% % % % % % % % % % % % % % % % % % % % % %
%
% -- IMPORTANT NOTE
%
% This template contains comments intended
% to minimize problems and delays during our production
% process. Please follow the template instructions
% whenever possible.
%
% % % % % % % % % % % % % % % % % % % % % % %
%
% Once your paper is accepted for publication,
% PLEASE REMOVE ALL TRACKED CHANGES in this file and leave only
% the final text of your manuscript.
%
% There are no restrictions on package use within the LaTeX files except that
% no packages listed in the template may be deleted.
%
% Please do not include colors or graphics in the text.
%
% Please do not create a heading level below \subsection. For 3rd level headings, use \paragraph*{}.
%
% % % % % % % % % % % % % % % % % % % % % % %
%
% -- FIGURES AND TABLES
%
% Please include tables/figure captions directly after the paragraph where they are first cited in the text.
%
% DO NOT INCLUDE GRAPHICS IN YOUR MANUSCRIPT
% - Figures should be uploaded separately from your manuscript file.
% - Figures generated using LaTeX should be extracted and removed from the PDF before submission.
% - Figures containing multiple panels/subfigures must be combined into one image file before submission.
% For figure citations, please use "Fig." instead of "Figure".
% See http://www.plosone.org/static/figureGuidelines for PLOS figure guidelines.
%
% Tables should be cell-based and may not contain:
% - tabs/spacing/line breaks within cells to alter layout or alignment
% - vertically-merged cells (no tabular environments within tabular environments, do not use \multirow)
% - colors, shading, or graphic objects
% See http://www.plosone.org/static/figureGuidelines#tables for table guidelines.
%
% For tables that exceed the width of the text column, use the adjustwidth environment as illustrated in the example table in text below.
%
% % % % % % % % % % % % % % % % % % % % % % % %
%
% -- EQUATIONS, MATH SYMBOLS, SUBSCRIPTS, AND SUPERSCRIPTS
%
% IMPORTANT
% Below are a few tips to help format your equations and other special characters according to our specifications. For more tips to help reduce the possibility of formatting errors during conversion, please see our LaTeX guidelines at http://www.plosone.org/static/latexGuidelines
%
% Please be sure to include all portions of an equation in the math environment.
%
% Do not include text that is not math in the math environment. For example, CO2 will be CO\textsubscript{2}.
%
% Please add line breaks to long display equations when possible in order to fit size of the column.
%
% For inline equations, please do not include punctuation (commas, etc) within the math environment unless this is part of the equation.
%
% % % % % % % % % % % % % % % % % % % % % % % %
%
% Please contact [email protected] with any questions.
%
% % % % % % % % % % % % % % % % % % % % % % % %
\documentclass[10pt,letterpaper]{article}
\usepackage[top=0.85in,left=2.75in,footskip=0.75in]{geometry}
% Use adjustwidth environment to exceed column width (see example table in text)
\usepackage{changepage}
% Use Unicode characters when possible
%\usepackage[utf8]{inputenc}
% textcomp package and marvosym package for additional characters
\usepackage{textcomp,marvosym}
% fixltx2e package for \textsubscript
\usepackage{fixltx2e}
% amsmath and amssymb packages, useful for mathematical formulas and symbols
\usepackage{amsmath,amssymb}
% cite package, to clean up citations in the main text. Do not remove.
\usepackage{cite}
% Use nameref to cite supporting information files (see Supporting Information section for more info)
\usepackage{nameref}
\usepackage{color}
\usepackage[colorlinks=true,
linkcolor=blue,
urlcolor=blue,
citecolor=black]{hyperref}
% line numbers
\usepackage[right]{lineno}
% ligatures disabled
\usepackage{microtype}
\DisableLigatures[f]{encoding = *, family = * }
% rotating package for sideways tables
\usepackage{rotating}
% Remove comment for double spacing
%\usepackage{setspace}
%\doublespacing
\usepackage{graphicx}
\usepackage[space]{grffile}
\usepackage{latexsym}
\usepackage{textcomp}
\usepackage{longtable}
\usepackage{multirow,booktabs}
% You can conditionalize code for latexml or normal latex using this.
\newif\iflatexml\latexmlfalse
\usepackage[utf8]{inputenc}
\usepackage[ngerman,greek,english]{babel}
%% Neutralize any \includegraphics in the document, as PLOS does not allow figures in the final submission
\makeatletter
\let\orig@includegraphics\includegraphics
\AtBeginDocument{\let\includegraphics\PLOS@ignore}
\newcommand{\PLOS@ignore}[2][]{}
\makeatother
% Text layout
\raggedright
\setlength{\parindent}{0.5cm}
\textwidth 5.25in
\textheight 8.75in
% Bold the 'Figure #' in the caption and separate it from the title/caption with a period
% Captions will be left justified
\usepackage[aboveskip=1pt,labelfont=bf,labelsep=period,justification=raggedright,singlelinecheck=off]{caption}
% Use the PLoS provided BiBTeX style
\bibliographystyle{plos2015}
% Remove brackets from numbering in List of References
\makeatletter
\renewcommand{\@biblabel}[1]{\quad#1.}
\makeatother
% Leave date blank
\date{}
% Header and Footer with logo
\usepackage{lastpage,fancyhdr,graphicx}
\usepackage{epstopdf}
\pagestyle{myheadings}
\pagestyle{fancy}
\fancyhf{}
\makeatletter
\lhead{\orig@includegraphics[width=2.0in]{PLOS-submission.eps}}
\makeatother
\rfoot{\thepage/\pageref{LastPage}}
\renewcommand{\footrule}{\hrule height 2pt \vspace{2mm}}
\fancyheadoffset[L]{2.25in}
\fancyfootoffset[L]{2.25in}
\lfoot{\sf PLOS}
%% Include all macros below
\newcommand{\lorem}{{\bf LOREM}}
\newcommand{\ipsum}{{\bf IPSUM}}
\usepackage{color}
\usepackage{listings}
\lstset{ %
backgroundcolor=\color{white}, % choose the background color
basicstyle=\footnotesize\ttfamily, % size of fonts used for the code
breaklines=true, % automatic line breaking only at whitespace
captionpos=b, % sets the caption-position to bottom
commentstyle=\color{OliveGreen}, % comment style
keywordstyle=\color{blue}, % keyword style
stringstyle=\color{black}, % string literal style
language=Python, % Set your language (you can change the language for each code-block optionally)
frame=l, %
xleftmargin=\fboxsep, %
xrightmargin=-\fboxsep, %
}
\hyphenation{smFRET}
\hyphenation{FRETBursts}
%% END MACROS SECTION
%DIF PREAMBLE EXTENSION ADDED BY LATEXDIFF
%DIF UNDERLINE PREAMBLE %DIF PREAMBLE
\RequirePackage[normalem]{ulem} %DIF PREAMBLE
\RequirePackage{color}\definecolor{RED}{rgb}{1,0,0}\definecolor{BLUE}{rgb}{0,0,1} %DIF PREAMBLE
\providecommand{\DIFaddtex}[1]{{\protect\color{blue}\uwave{#1}}} %DIF PREAMBLE
\providecommand{\DIFdeltex}[1]{{\protect\color{red}\sout{#1}}} %DIF PREAMBLE
%DIF SAFE PREAMBLE %DIF PREAMBLE
\providecommand{\DIFaddbegin}{} %DIF PREAMBLE
\providecommand{\DIFaddend}{} %DIF PREAMBLE
\providecommand{\DIFdelbegin}{} %DIF PREAMBLE
\providecommand{\DIFdelend}{} %DIF PREAMBLE
%DIF FLOATSAFE PREAMBLE %DIF PREAMBLE
\providecommand{\DIFaddFL}[1]{\DIFadd{#1}} %DIF PREAMBLE
\providecommand{\DIFdelFL}[1]{\DIFdel{#1}} %DIF PREAMBLE
\providecommand{\DIFaddbeginFL}{} %DIF PREAMBLE
\providecommand{\DIFaddendFL}{} %DIF PREAMBLE
\providecommand{\DIFdelbeginFL}{} %DIF PREAMBLE
\providecommand{\DIFdelendFL}{} %DIF PREAMBLE
%DIF END PREAMBLE EXTENSION ADDED BY LATEXDIFF
%DIF PREAMBLE EXTENSION ADDED BY LATEXDIFF
%DIF HYPERREF PREAMBLE %DIF PREAMBLE
\providecommand{\DIFadd}[1]{\texorpdfstring{\DIFaddtex{#1}}{#1}} %DIF PREAMBLE
\providecommand{\DIFdel}[1]{\texorpdfstring{\DIFdeltex{#1}}{}} %DIF PREAMBLE
%DIF END PREAMBLE EXTENSION ADDED BY LATEXDIFF
\begin{document}
\vspace*{0.35in}
% Title must be 250 characters or less.
\begin{flushleft}
{\Large
\textbf\newline{\input{title}}
}
\newline
% Insert author names, affiliations and corresponding author email (do not include titles, positions, or degrees).
\\
Antonino Ingargiola\textsuperscript{1*},
Eitan Lerner\textsuperscript{1},
SangYoon Chung\textsuperscript{1},
Shimon Weiss\textsuperscript{1},
Xavier Michalet\textsuperscript{1},
\\
\bigskip
\textbf{1} Dept. Chemistry and Biochemistry, Univ. of California Los Angeles, Los Angeles, CA, USA
\bigskip
% Use the asterisk to denote corresponding authorship and provide email address in note below.
* [email protected]
\end{flushleft}
% Please keep the abstract below 300 words
\section*{Abstract}
Single-molecule Förster Resonance Energy Transfer (smFRET) allows
probing intermolecular interactions and conformational changes in
biomacromolecules, and represents an invaluable tool for studying
cellular processes at the molecular scale. smFRET experiments can
detect the distance between two fluorescent labels (donor and acceptor)
in the 3-10~nm range. In the commonly employed confocal geometry,
molecules are free to diffuse in solution. When a molecule traverses
the excitation volume, it emits a burst of photons, which can be detected
by single-photon avalanche diode (SPAD) detectors. The intensities of
donor and acceptor fluorescence can then be related to the distance
between the two fluorophores.
While recent years have seen a growing number of contributions
proposing improvements or new techniques in smFRET data analysis,
rarely have those publications been accompanied by software implementation.
In particular, despite the widespread application of smFRET, no complete
software package for smFRET burst analysis is freely available to date.
In this paper, we introduce FRETBursts, an open source software
for analysis of freely-diffusing smFRET data.
FRETBursts allows executing all the fundamental steps of smFRET bursts
analysis using state-of-the-art as well as novel techniques,
while providing an open, robust and well-documented implementation.
Therefore, FRETBursts represents an ideal platform for comparison
and development of new methods in burst analysis.
We employ modern software engineering principles in order to
minimize bugs and facilitate long-term maintainability.
Furthermore, we place a strong focus on reproducibility by relying on
Jupyter notebooks for FRETBursts execution.
Notebooks are executable documents capturing all the steps of the
analysis (including data files, input parameters, and results) and can
be easily shared to replicate complete smFRET analyzes.
Notebooks allow beginners to execute complex workflows
and advanced users to customize the analysis for their own needs.
By bundling analysis description, code and results in a single document,
FRETBursts allows to seamless share analysis workflows
and results, encourages reproducibility and facilitates collaboration
among researchers in the single-molecule community.
% Please keep the Author Summary between 150 and 200 words
% Use first person. PLOS ONE authors please skip this step.
% Author Summary not valid for PLOS ONE submissions.
%\section*{Author Summary}
\linenumbers
\section*{Introduction}
\subsection*{Open Science and Reproducibility}
Over the past 20 years, single molecule FRET (smFRET) has grown into one of the most
useful techniques in single-molecule spectroscopy~\cite{Weiss_1999,Hohlbein_2014}.
While it is possible to extract information on sub-populations using ensemble measurements
(e.g. ~\cite{Lerner_2014,Rahamim_2015}),
smFRET unique feature is its ability to very straightforwardly resolve conformational
changes of biomolecules or measure binding-unbinding kinetics in heterogeneous
samples~\cite{Selvin_2000,Roy_2008,Schuler_2008,Sisamakis_2010,Haran_2012}.
smFRET measurements on freely diffusing molecules (the focus of this paper)
have the additional advantage, over measurements performed on immobilized molecules,
of allowing to probe molecules and processes without perturbation from surface
immobilization or additional functionalization needed for surface
attachment~\cite{Eggeling_1998,Dahan_1999}.
The increasing amount of work using freely-diffusing smFRET has motivated
a growing number of theoretical contributions to the specific topic of data
analysis~\cite{Fries_1998,Eggeling_2001,Zhang_2005,Gopich_2005,Lee_2005,Nir_2006,Antonik2006,Gopich_2007,Gopich_2008,Camley_2009,Santoso_2010,Torella_2011,Tomov_2012}.
Despite this profusion of publications, most research groups still rely on
their own implementation of a limited number of methods, with very little
collaboration or code sharing.
To clarify this statement, let us point that our own group's past smFRET papers
merely mention the use of custom-made software without additional details~\cite{Lee_2005,Nir_2006}.
Even though some of these software tools are made available upon request,
or sometimes shared publicly on websites,
it remains hard to reproduce and validate results from different groups,
let alone build upon them.
Additionally, as new methods are proposed in literature,
it is generally difficult to quantify their performance compared to other methods.
An independent quantitative assessment
would require a complete reimplementation, an effort few groups can afford.
As a result, potentially useful analysis improvements
are either rarely or slowly adopted by the community.
In contrast with other established traditions such as
sharing protocols and samples, in the domain of scientific software,
we have relegated ourselves to islands of non-communication.
From a more general standpoint, the non-availability of the code
used to produce scientific results, hinders reproducibility,
makes it impossible to review and validate the software's correctness
and prevents improvements and extensions by other scientists.
This situation, common in many disciplines,
represents a real impediment to the scientific progress.
Since the pioneering work of the Donoho group in the 90's~\cite{Buckheit_1995},
it has become evident that developing and maintaining open source scientific software
for reproducible research is a critical requirement of the modern
scientific enterprise~\cite{Ince_2012,Vihinen_2015}.
%Peer-reviewed publications describing such software are also necessary~\cite{Pradal_2013},
%although the debate is still open on the most effective model for peer-reviewing this
%class of publications~\cite{Check_Hayden_2013,Check_Hayden_2015}
%(\href{https://software-carpentry.org/blog/2015/04/quality-is-free-getting-there-isnt.html}{Willson 2015})
%(\href{https://www.mozillascience.org/effective-code-review-for-journals}{Mills 2015})
%(\href{http://ivory.idyll.org/blog/2015-we-live-in-a-bubble.html}{Brown 2015} and \href{http://ivory.idyll.org/blog/on-code-review-of-scientific-code.html}{2013}).
Other disciplines have started tackling this issue~\cite{Eglen_2016},
and even in the single-molecule field a few recent publications have provided
software for analysis of surface-immobilized experiments~\cite{McKinney_2006,Bronson_2009,Greenfeld_2012,K_nig_2013,van_de_Meent_2014}.
For freely-diffusing smFRET experiments, although it is common to find mention of
``code available from the authors upon reques'' in publications, there is a dearth
of such open source code, with, to our knowledge, the notable exception of a single
example~\cite{Murphy2014}.
To address this issue, we have developed FRETBursts,
an open source Python software for analysis of freely-diffusing single-molecule FRET measurements.
FRETBursts can be used, inspected and modified by anyone interested in using
state-of-the art smFRET analysis methods or implementing modifications or completely new techniques.
FRETBursts therefore represents an ideal platform
for quantitative comparison of different methods for smFRET burst analysis.
Technically, a strong emphasis has been given to the reproducibility of complete analysis
workflows. FRETBursts uses Jupyter Notebooks~\cite{Shen_2014},
an interactive and executable document containing textual narrative, input parameters,
code, and computational results (tables, plots, etc.). A notebook thus captures the various analysis steps
in a document which is easy to share and execute.
To minimize the possibility of bugs being introduced inadvertently~\cite{Soergel_2015},
we employ modern software engineering techniques
such as unit testing and continuous integration~\cite{Wilson_2014,Eglen_2016}.
FRETBursts is hosted on GitHub~\cite{Blischak_2016,Prli__2012},
where users can write comments, report issues or contribute code.
In a related effort, we recently introduced Photon-HDF5~\cite{Ingargiola2016},
an open file format for timestamp-based single-molecule fluorescence
experiments. An other related open source tool is PyBroMo~\cite{Ingargiola_2016},
a freely-diffusing smFRET simulator which produces Photon-HDF5 files that are
directly analyzable with FRETBursts.
Together with all the aforementioned tools, FRETBursts contributes to the growing
ecosystem of open tools for reproducible science in the single-molecule field.
\subsection*{Paper Overview}
This paper is written as an introduction to smFRET burst analysis and
its implementation in FRETBursts.
The aim is illustrating the specificities and
trade-offs involved in various approaches
with sufficient details to enable readers
to customize the analysis for their own needs.
After a brief overview of FRETBursts features (section~\nameref{sec:overview}),
we introduce essential concepts and terminology for smFRET burst analysis
(section~\nameref{sec:concepts}).
In section~\nameref{sec:analysis}, we illustrate the steps involved
in smFRET burst analysis: (i) data loading (section~\nameref{sec:dataload}),
(ii) definition of the excitation alternation periods
(section~\nameref{sec:alternation}), (iii) background correction
(section~\nameref{sec:bg_calc}), (iv) burst search
(section~\nameref{sec:burstsearch}),
(v) burst selection (section~\nameref{sec:burstsel}) and
(vi) FRET histogram fitting (section~\nameref{sec:fretfit}).
As an example
of implementation of an advanced data processing technique,
section~\nameref{sec:bva} walks the reader thorough implementing
Burst Variance Analysis (BVA)~\cite{Torella_2011}.
Finally, section~\nameref{sec:conclusions} summarizes what we believe
to be the strengths of FRETBursts software.
Throughout this paper,
links to relevant sections of documentation and other web resources
are displayed as ``(link)''.
In order to make the text more legible,
we have concentrated Python-specific details in paragraphs titled
\textit{Python details}. These subsections provide deeper insights for readers
already familiar with Python and can be initially skipped by readers who are not.
Finally, note that all commands and figures in this paper can be regenerated
using the accompanying notebooks
(\href{https://github.com/tritemio/fretbursts_paper}{link}).
\section*{FRETBursts Overview}
\label{sec:overview}
\subsection*{Technical Features}
FRETBursts can analyze smFRET measurements
from one or multiple excitation spots~\cite{Ingargiola_2013}. The supported
excitation schemes include single laser, alternating laser excitation (ALEX)
with either CW lasers (μs-ALEX~\cite{Kapanidis_2005})
or pulsed lasers (ns-ALEX~\cite{Laurence_2005} or
pulsed-interleaved excitation (PIE)~\cite{M_ller_2005}).
The software implements both standard and novel algorithms for smFRET data analysis
including background estimation as a function of time (including background accuracy
metrics), sliding-window burst search~\cite{Eggeling_1998},
dual-channel burst search (DCBS)~\cite{Nir_2006} and
modular burst selection methods based on user-defined criteria
(including a large set of pre-defined selection rules). Novel features include burst size
selection with $\gamma$-corrected burst sizes, burst weighting, burst search with
background-dependent threshold (in order to guarantee a minimal signal-to-background
ratio~\cite{Michalet_2012}).
Moreover, FRETBursts provides a large set of fitting options to characterize FRET subpopulations.
In particular, distributions of burst quantities (such as $E$ or $S$) can be assessed
through (1) histogram fitting (with arbitrary model functions),
(2) non-parametric weighted kernel density estimation (KDE), (3) weighted
expectation-maximization (EM), (4) maximum likelihood fitting using Gaussian models
or Poisson statistic. Finally FRETBursts includes a large number of
predefined and customizable plot functions which (thanks to the \textit{matplotlib}
graphic library) produce publication quality plots in a wide range of formats.
Additionally, implementations of population dynamics analysis such
as Burst Variance Analysis (BVA)~\cite{Torella_2011} and two-channel
kernel density distribution estimator (2CDE)~\cite{Tomov_2012}
are available as FRETBursts notebooks
\DIFdelbegin \DIFdel{.
}\DIFdelend \DIFaddbegin \DIFadd{(}\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/Example%20-%20Burst%20Variance%20Analysis.ipynb}{BVA link},
\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/Example%20-%202CDE%20Method.ipynb}{2CDE link}\DIFadd{).
}\DIFaddend
\subsection*{Software Availability}
FRETBursts is hosted and openly developed on GitHub. FRETBursts homepage
(\href{http://tritemio.github.io/FRETBursts}{link})
contains links to the various resources. \DIFaddbegin \DIFadd{Pre-built packages are provided for
Windows, OS X and Linux. }\DIFaddend Installation instructions
can be found in the Reference Documentation
(\href{http://fretbursts.readthedocs.org/en/latest/getting_started.html}{link}).
A description of FRETBursts execution using Jupyter notebooks is reported
in~\nameref{sec:notebook}. % SI_link
Detailed information on development style, testing strategies and
contributions guidelines are reported in~\nameref{sec:dev}. % SI_link
Finally, to facilitate evaluation and comparison with other software,
we set up an on-line services allowing to execute FRETBursts
without requiring any installation on the user's computer (\href{https://github.com/tritemio/FRETBursts_notebooks#run-online}{link}).
\section*{Architecture and Concepts}
\label{sec:concepts}
In this section, we introduce some general burst analysis concepts
and notations used in FRETBursts.
\subsection*{Photon Streams}
\label{sec:ph_streams}
The raw data collected during a smFRET experiment consists in one or more arrays of
photon timestamps, whose temporal resolution is set by the acquisition hardware,
typically between 10 and 50 ns.
In single-spot measurements, all timestamps are stored in a single array. In multispot
measurements~\cite{Ingargiola_2013}, there are as many timestamps arrays
as excitation spots.
Each array contains timestamps from both donor (D) and acceptor (A) channels.
When alternating excitation lasers are used (ALEX measurements)~\cite{Lee_2005},
a further distinction between photons emitted during the D or A excitation periods can be made.
In FRETBursts, the corresponding sets of photons are called ``photon streams''
and are specified with a \verb|Ph_sel| object
(\href{http://fretbursts.readthedocs.org/en/latest/ph_sel.html}{link}).
In non-ALEX smFRET data, there are 3 photon streams
(table~\ref{tab:ph_sel_smfret}), while in \DIFaddbegin \DIFadd{two-color }\DIFaddend ALEX data,
there are 5 streams (table~\ref{tab:ph_sel_alex}).
The \verb|Ph_sel| class (\href{http://fretbursts.readthedocs.org/en/latest/ph_sel.html}{link})
allows the specification of any combination of photon streams.
For example, in ALEX measurements, the D-emission during A-excitation stream is
usually ignored because it does not contain any useful signal~\cite{Lee_2005}.
To indicate all but photons in this photon stream, the syntax is
\verb|Ph_sel(Dex='DAem', Aex='Aem')|, which indicates selection of donor
and acceptor photons (\verb|DAem|) during donor excitation (\verb|Dex|) and only acceptor
photons (\verb|Aem|) during acceptor excitation (\verb|Aex|).
\begin{table}
\begin{tabular}{l|l}
Photon selection & code \\
\hline
All-photons & \verb|Ph_sel('all')|\\
D-emission & \verb|Ph_sel(Dex='Dem')|\\
A-emission & \verb|Ph_sel(Dex='Aem')|\\
\end{tabular}
\caption{\label{tab:ph_sel_smfret}Photon selection syntax (non-ALEX)}
\end{table}
\begin{table}
\begin{tabular}{l|l}
Photon selection & code \\
\hline
All-photons & \verb|Ph_sel('all')|\\
D-emission during D-excitation & \verb|Ph_sel(Dex='Dem')|\\
A-emission during D-excitation & \verb|Ph_sel(Dex='Aem')|\\
D-emission during A-excitation & \verb|Ph_sel(Aex='Dem')|\\
A-emission during A-excitation & \verb|Ph_sel(Aex='Aem')|\\
\end{tabular}
\caption{\label{tab:ph_sel_alex}Photon selection syntax (ALEX)}
\end{table}
\subsection*{Background Definitions}
\label{sec:bg_intro}
An estimation of the background rates is needed to both select a proper threshold for
burst search, and to correct the raw burst counts by \DIFdelbegin \DIFdel{subtraction of }\DIFdelend \DIFaddbegin \DIFadd{subtracting }\DIFaddend background counts.
The recorded stream of timestamps is the result of two processes: one characterized
by a high count rate, due to fluorescence photons of single molecules crossing the
excitation volume, and another characterized by a lower count rate, due to ``background
counts'' originating from detector dark counts, afterpulsing, out-of-focus molecules
and sample scattering and/or impurities~\cite{Edman_1996,Gopich_2008}.
The signature of these two types of processes can be
observed in the inter-photon delays distribution (i.e. the waiting times
between two subsequent timestamps) as illustrated in figure~\ref{fig:bg_dist_all}(a).
The ``tail'' of the distribution (a straight line in semi-log scale) corresponds
to exponentially-distributed time-delays, indicating that those counts are generated by a
Poisson process. At short
timescales, the distribution departs from the exponential due to the contribution
of the higher rate process of single molecules traversing the excitation volume.
To estimate the background rate (i.e. the inverse of the exponential time constant),
it is necessary to define a time-delay threshold above which the distribution
can be considered exponential.
Finally, a parameter estimation method needs to be specified, such as Maximum
Likelihood Estimation (MLE) or non-linear least squares curve fitting of
the time-delay histogram (both supported in FRETBursts).
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.77\columnwidth]{figures/ph_delays_distrib_all/ph_delays_distrib_all}
\caption{\label{fig:bg_dist_all} \textbf{Inter-photon delays fitted with and exponential function.}
Experimental distributions of inter-photon delays (\textit{dots}) and
corresponding fits of the exponential tail (\textit{solid lines}).
(\textit{Panel a}) An example of inter-photon delays distribution (\textit{red dots}) and an exponential fit
of the tail of the distribution (\textit{black line}).
(\textit{Panel b}) Inter-photon delays distribution and exponential fit for different photon streams as obtained with \texttt{dplot(d, hist\_bg)}. The \textit{dots} represent the experimental histogram for the different photon streams. The \textit{solid lines} represent the corresponding exponential fit of the tail of the distributions. The legend shows abbreviations of the photon streams
and the fitted background rates.%
}
\end{center}
\end{figure}
It is advisable to monitor the background as a function of time
throughout the measurement, in order to account for possible variations.
Experimentally, we found that when the background is not constant,
it usually varies
on time scales of tens of seconds (see figure~\ref{fig:bg_timetrace}).
FRETBursts divides the acquisition in constant-duration time
windows called \textit{background periods} and computes the background rates for
each of these windows (see section~\nameref{sec:bg_calc}).
Note that FRETBursts uses these local background rates also during burst search,
in order to compute time-dependent burst detection thresholds
and for background correction of burst data (see section~\nameref{sec:burstsearch}).
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.91\columnwidth]{figures/background_timetrace/background_timetrace}
\caption{\label{fig:bg_timetrace} \textbf{Background rates as a function of time.}
Estimated background rate as a function of time for two μs-ALEX measurements.
Different colors represent different photon streams.
(\textit{Panel a}) A measurement performed with a sealed sample chamber
exhibiting constant a background as a function of time.
(\textit{Panel b}) A measurement performed on an unsealed sample exhibiting
significant background variations due to sample evaporation and/or
photobleaching (likely impurities on the cover-glass).
These plots are produced by the command
\texttt{dplot(d, timetrace\_bg)} after estimation of background.
Each data point in these figures is computed for a 30~s time window.%
}
\end{center}
\end{figure}
\subsection*{The \texttt{Data} Class}
\label{sec:data_intro}
The \verb|Data| class
(\href{http://fretbursts.readthedocs.org/en/latest/data_class.html}{link})
is the fundamental data container in FRETBursts. It contains the
measurement data and parameters (attributes) as well as several methods
for data analysis (background estimation, burst search, etc...).
All analysis results (bursts data, estimated parameters) are also stored
as \verb|Data| attributes.
There are 3 important ``burst counts'' attributes which contain
the number of photons detected in the donor or the acceptor channel
during donor or acceptor excitation (table~\ref{tab:data_n}).
The attributes in table~\ref{tab:data_n} are background-corrected by default.
Furthermore, \verb|na| is corrected for leakage and direct excitation
(section~\nameref{sec:corrcoeff}) if the relative coefficients are specified
(by default they are 0).
There is also a closely related attribute named \verb|nda| for donor photons
during acceptor excitation. \verb|nda| is normally neglected as it only contains
background.
\begin{table}
\begin{tabular}{l p{0.8\columnwidth}}
Name & Description \\
\hline
\verb|nd| & number of photons detected by the donor channel (during donor excitation period in ALEX case)\\
\verb|na| & number of photons detected by the acceptor channel (during donor excitation period in ALEX case)\\
\verb|naa| & number of photons detected by the acceptor channel during acceptor excitation period (present only in ALEX measurements)\\
\end{tabular}
\caption{\label{tab:data_n}\texttt{Data} attributes names and descriptions for burst photon counts in different photon streams.}
\end{table}
\paragraph*{Python details}
Many \verb|Data| attributes are lists of arrays (or scalars) with the length of the lists
equal to the number of excitation spots. This means that in
single-spot measurements, an array of burst-data
is accessed by specifying the index as 0, for example \verb|Data.nd[0]|.
\verb|Data| implements a shortcut syntax to access the first element of a list
with an underscore, so that an equivalently syntax is
\verb|Data.nd_| instead of \verb|Data.nd[0]|.
\subsection*{Introduction to Burst Search}
\label{sec:burstsearch_intro}
Identifying single-molecule fluorescence bursts in the stream of photons is
one of the most crucial steps in the analysis of freely-diffusing single-molecule FRET data.
The widely used ``sliding window'' algorithm, introduced by the Seidel group in
1998~\cite{Eggeling_1998,Fries_1998}, involves searching for
$m$ consecutive photons detected during a period shorter than
$\Delta t$. In other words, bursts are regions of the photon stream where the
local rate (computed using $m$ photons) is above a minimum threshold rate.
Since a universal criterion to choose the rate threshold and
the number of photons $m$ is, as of today, lacking, it has become a common
practice to manually adjust those parameters for each specific measurement.
\DIFaddbegin \DIFadd{Commonly employed values for $m$ are between 5 and 15 photons.
}\DIFaddend
A more general approach consists in taking into account the background rate of
the specific measurements and in choosing a rate threshold that is $F$ times
larger than the background rate \DIFaddbegin \DIFadd{(typical values for $F$ are between 4 and 9)}\DIFaddend .
This approach ensures that all resulting bursts
have a signal-to-background ratio (SBR) larger than
$(F-1)$~\cite{Michalet_2012}. A consistent criterion for choosing the threshold is
particularly important when comparing different measurements with different background
rates, when the background significantly varies during measurements or in
multi-spot measurements where each spot has a different background rate.
A second important aspect of burst search is the choice of photon stream used
to perform the search.
In most cases, for instance when identifying FRET sub-populations,
the burst search should use all \DIFdelbegin \DIFdel{photons (i.e. APBS). In some }\DIFdelend \DIFaddbegin \DIFadd{the photons, the so called
all-photon burst search (APBS)~\mbox{%DIFAUXCMD
\cite{Eggeling_1998,Fries_1998,Nir_2006}}%DIFAUXCMD
.
In }\DIFaddend other cases, \DIFaddbegin \DIFadd{for example }\DIFaddend when focusing on
donor-only or \DIFdelbegin \DIFdel{acceptor only }\DIFdelend \DIFaddbegin \DIFadd{acceptor-only }\DIFaddend populations, it is better to perform
the search using only donor or acceptor signal.
In order to handle the general case and to provide flexibility,
FRETBursts allows performing the burst search on arbitrary selections of photons.
(see section~\nameref{sec:ph_streams} for more information on photon stream definitions).
Additionally, Nir~\textit{et al.}~\cite{Nir_2006} proposed \DIFdelbegin \DIFdel{DCBS (``}\DIFdelend \DIFaddbegin \DIFadd{a }\DIFaddend dual-channel
burst search \DIFdelbegin \DIFdel{'')
, }\DIFdelend \DIFaddbegin \DIFadd{(DCBS)
}\DIFaddend which can help mitigating artifacts due to photophysics effects such as blinking.
During DCBS, a search is performed \DIFdelbegin \DIFdel{in parallel }\DIFdelend on two photon streams
and bursts are defined as periods during which both photon streams
exhibit a rate higher than
the threshold, implementing the equivalent of an AND logic operation.
Conventionally, the term DCBS refers to a burst search where the two photon streams
are (1) all photons during donor excitation (\verb|Ph_sel(Dex='DAem')|) and
(2) acceptor channel photons during acceptor excitation (\verb|Ph_sel(Aex='Aem')|).
In FRETBursts, the user can choose arbitrary photon streams as input, an in general
this kind of search is called a ``AND-gate burst search''.
After burst search, it is necessary to select
bursts, for instance by specifying a minimum number of photons (or burst size). In the most
basic form, this selection can be performed during burst search by discarding
bursts with size smaller than a threshold $L$ \DIFaddbegin \DIFadd{(typically 30 or higher)}\DIFaddend ,
as originally proposed by
Eggeling~\textit{et al.}~\cite{Eggeling_1998}.
This method, however, neglects the effect
of background and $\gamma$ factor on the burst size and can lead to a selection
bias for some channels and/or sub-populations.
For this reason, we suggest performing a burst size selection after background
correction, taking into account the $\gamma$ factor, as discussed in
sections~\nameref{sec:burstsizeweights} and~\nameref{sec:burstsel}.
In special cases, users may choose to replace (or combine)
the burst selection based on burst size
with another criterion such as burst duration or brightness (see section~\nameref{sec:burstsel}).
\subsection*{Corrected Burst Sizes and Weights}
\label{sec:burstsizeweights}
The number of photons detected during a burst --the ``burst size''--
is computed using either all photons, or photons detected
during donor excitation period. To compute the burst size, FRETBursts uses
one of the following formulas:
\begin{equation}
\label{eq:burstsize_dex}
n_{dex} = n_a + \gamma\,n_d
\end{equation}
\begin{equation}
\label{eq:burstsize_allph}
n_t = n_a + \gamma\,n_d + n_{aa}
\end{equation}
\noindent where $n_d$, $n_a$ and $n_{aa}$ are, similarly to the attributes
in table~\ref{tab:data_n}, the background-corrected
burst counts in different channels and excitation periods.
The factor $\gamma$ takes into account
different fluorescence quantum yields of donor and acceptor fluorophores and different
photon detection efficiencies between donor and acceptor detection
channels~\cite{Deniz_1999,Lee_2005}.
Eq.~\ref{eq:burstsize_dex} includes counts collected during donor excitation periods only,
while eq.~\ref{eq:burstsize_allph} includes all counts.
Burst sizes computed according to eq.~\ref{eq:burstsize_dex}
or~\ref{eq:burstsize_allph} are called $\gamma$-corrected burst sizes.
The burst search algorithm yields a set of bursts whose sizes
approximately follow an exponential distribution.
Compared to bursts with smaller sizes, bursts with large sizes are less frequent,
but contain more information per-burst (having higher SNR).
Therefore, selecting bursts by size is an important step (see \DIFdelbegin \DIFdel{section~}\DIFdelend \nameref{sec:burstsel}).
A threshold set too low may result in unresolvable sub-populations
because of broadening of FRET peaks and appearance of shot-noise artifacts
in the FRET (and \DIFdelbegin \DIFdel{S}\DIFdelend \DIFaddbegin \DIFadd{$S$}\DIFaddend ) distribution (i.e. spurious narrow peaks due to \DIFdelbegin \DIFdel{E and S }\DIFdelend \DIFaddbegin \DIFadd{$E$ and $S$ }\DIFaddend being
computed as the ratio of small integers).
Conversely, too large a threshold may result in too low a number of bursts
therefore poor representation of the FRET distribution.
Additionally, especially when computing fractions of sub-populations
(e.g. ratio of number of bursts in each sub-population),
it is important to use $\gamma$-corrected burst sizes as selection criterion,
in order to avoid under-representing some FRET sub-populations
due to different quantum yields of donor and acceptor dyes and/or
different photon detection efficiencies of donor and acceptor channels.
\DIFaddbegin \DIFadd{An alternative method to apply the $\gamma$ correction is to randomly
discard a constant fraction of photons chosen randomly from either
the Dem or Aem photon stream~\mbox{%DIFAUXCMD
\cite{Nir_2006}}%DIFAUXCMD
. This
simple method transforms the measurement data in order to
achieve $\gamma=1$, overcoming the issue of selection bias between populations.
This approach has also the advantage of preserving
the binomial distribution of D and A photons in each burst, so that peaks
of FRET populations are easier to model statistically.
The only drawback is that, by discarding a fraction of photons,
this method leads to information loss and therefore to a potential
decrease in sensitivity and/or accuracy.
}
\DIFaddend A simple way to mitigate the dependence of the FRET distribution on
the burst size selection threshold is weighting bursts proportionally to their size
so that the bursts with largest sizes will have the largest weights.
Using size as weights (instead of any other monotonically increasing function
of size) can be justified noticing that the variance of bursts proximity ratio (PR) is
inversely proportional to the burst size (see~\nameref{sec:burstweights_theory} for details). % SI_link
In general, a weighting scheme is used for building efficient estimators for a population
parameter (e.g. the population FRET efficiency $E_p$).
But, it can also be used to build weighted histograms or Kernel Density
Estimation (KDE) plots which emphasize FRET subpopulations peaks
without excluding small size bursts.
Traditionally, for optimal results when not using weights, the
FRET histogram is manually adjusted by finding an ad-hoc (high)
size-threshold which selects only bursts with the highest size (and thus lowest variance).
Building size-weighted FRET histograms is a simple method to balance
the need of reducing the peaks width with the need of including as much bursts
as possible to reduce statistical noise.
As a practical example, by fixing the burst size threshold to a low value (e.g. 10-20 photons)
and using weights, is possible to build a FRET histogram with well-defined FRET sub-populations peaks
without the need of searching an optimal burst-size threshold (\nameref{sec:burstweights_theory}).
\paragraph*{Python details}
FRETBursts has the option to weight bursts using $\gamma$-corrected
burst sizes which optionally include acceptor excitation photons \verb|naa|.
A weight proportional to the burst size is applied by passing the argument
\verb|weights='size'| to histogram or KDE plot functions. The \verb|weights|
keyword can be also passed to fitting functions in order to fit
the weighted E or S distributions (see section~\nameref{sec:fretfit}).
Other weighting functions (for example depending quadratically on the size)
are listed in the \verb|fret_fit.get_weights| documentation
(\href{http://fretbursts.readthedocs.org/en/latest/fret_fit.html#fretbursts.fret_fit.get_weights}{link}).
However, using weights different from the size is not recommended
due to their less efficient use of burst information
\DIFaddbegin \DIFadd{(}\nameref{sec:burstweights_theory}\DIFadd{)}\DIFaddend .
\section*{smFRET Burst Analysis}
\label{sec:analysis}
\subsection*{Loading the Data}
\label{sec:dataload}
While FRETBursts can load several data files formats,
we encourage users to adopt the recently introduced Photon-HDF5
file format~\cite{Ingargiola2016}.
Photon-HDF5 is an HDF5-based, open format, specifically designed
for freely-diffusing smFRET and
other timestamp-based experiments.
Photon-HDF5 is a self-documented, platform- and language-independent binary format,
which supports compression and allows saving photon data (e.g. timestamps)
and measurement-specific metadata
(e.g. setup and sample information, authors, provenance, etc.).
Moreover, Photon-HDF5 is designed for long-term data preservation and aims
to facilitate data sharing
between different software and research groups.
All example data files provided with FRETBursts use the Photon-HDF5 format.
To load data from a Photon-HDF5 file, we use the function \verb|loader.photon_hdf5|
(\href{http://fretbursts.readthedocs.org/en/latest/loader.html#fretbursts.loader.photon_hdf5}{link}):
\begin{lstlisting}
d = loader.photon_hdf5(filename)
\end{lstlisting}
\noindent
where \verb|filename| is a string containing the file path.
This command loads the measurement data into the variable \verb|d|,
a \verb|Data| object (see section~\nameref{sec:data_intro}).
The same command can load data from a variety of smFRET measurements supported
by the Photon-HDF5 format, taking advantage of the rich metadata included with each file.
For instance, data generated using different excitation schemes such as CW excitation
or pulsed excitation, single-laser vs two alternating lasers, etc.,
or with any number of excitation spots, are automatically recognized and interpreted accordingly.
FRETBursts also supports loading μs-ALEX data stored in .sm files
(a custom binary format used in the Weiss lab) and
ns-ALEX data stored in .spc files (a binary format used by TCSPC Becker \& Hickl acquisition hardware).
Alternatively, these and other formats (such as ht3, a binary format used by PicoQuant hardware)
can be converted into Photon-HDF5 files using phconvert,
a file conversion library and utility for Photon-HDF5
(\href{http://photon-hdf5.github.io/phconvert/}{link}).
More information on loading different file formats
can be found in the \verb|loader| module's documentation
(\href{http://fretbursts.readthedocs.org/en/latest/loader.html}{link}).
\subsection*{Alternation Parameters}
\label{sec:alternation}
For μs-ALEX and ns-ALEX data, Photon-HDF5 normally stores parameters defining
alternation periods corresponding to donor and acceptor laser excitation.
At load time, a user can plot these parameters and change them if deemed necessary.
In μs-ALEX measurements~\cite{Kapanidis_2004},
CW laser lines are alternated on timescales of the order of 10 to 100~μs.
Plotting an histogram of timestamps modulo the alternation period, it
is possible to identify the donor and acceptor excitation periods (see figure~\ref{fig:altern_hist_double}a).
In ns-ALEX measurements~\cite{Laurence_2005},
pulsed lasers with equal repetition rates are delayed with respect
to one another with typical delays of 10 to 100~ns.
In this case, forming an histogram of TCSPC times (nanotimes) will allow
the definition of periods of fluorescence after excitation
of either the donor or the acceptor (see figure~\ref{fig:altern_hist_double}b).
In both cases, the function
\verb|plot_alternation_hist|
(\href{http://fretbursts.readthedocs.org/en/latest/plots.html#fretbursts.burst_plot.plot_alternation_hist}{link})
will plot the relevant alternation histogram (figure~\ref{fig:altern_hist_double})
using currently selected (or default) values for donor and acceptor excitation periods.
\begin{figure}[h!]
\begin{center}
\includegraphics[width=1\columnwidth]{figures/ALEX_alternation_double/ALEX_alternation_double}
\caption{\label{fig:altern_hist_double}
\textbf{Alternation histograms for μs-ALEX and ns-ALEX measurements.}
Histograms used for the selection/determination
of the alternation periods for two typical smFRET-ALEX experiments.
Distributions of photons detected by donor channel are in \textit{green},
and by acceptor channel in \textit{red}.
The light \textit{green} and \textit{red} shaded areas indicate the donor
and acceptor period definitions.
(a) μs-ALEX alternation histogram, i.e. histogram of timestamps \textit{modulo}
the alternation period for a smFRET measurement.
(b) ns-ALEX TCSPC nanotime histogram for a smFRET measurement.
Both plots have been generated by the same plot function
(\texttt{plot\_alternation\_hist()}).
Additional information on these specific measurements can be found in the
attached notebook
(\href{http://nbviewer.jupyter.org/github/tritemio/fretbursts_paper/blob/master/notebooks/Figures\%20-\%20ALEX\%20histograms.ipynb}{link}).%
}
\end{center}
\end{figure}
To change the period definitions, we can type:
\begin{lstlisting}
d.add(D_ON=(2100, 3900), A_ON=(100, 1900))
\end{lstlisting}
\DIFaddbegin \noindent \DIFaddend where \verb|D_ON| and \verb|A_ON| are tuples (pairs of numbers) representing
the \textit{start} and \textit{stop} values for D or A excitation periods.
The previous command works for both μs-ALEX and ns-ALEX measurements.
After changing the parameters, a new alternation plot will show the updated
period definitions.
The alternation period definition can be applied to the data
using the function \verb|loader.alex_apply_period|
(\href{http://fretbursts.readthedocs.org/en/latest/loader.html#fretbursts.loader.alex_apply_period}{link}):
\begin{lstlisting}
loader.alex_apply_period(d)
\end{lstlisting}
After this command, \verb|d| will contain only photons inside the defined excitation periods.
If the user needs to update the periods definition, the data file will need to be
reloaded and the steps above repeated as described.
\subsection*{Background Estimation}
\label{sec:bg_calc}
The first step of smFRET analysis involves estimating background rates.
For example, \DIFdelbegin \DIFdel{to compute the background }\DIFdelend \DIFaddbegin \DIFadd{the following command:
}
%DIF > Don't split command on two lines for PLOS
\begin{lstlisting}
d.calc_bg(bg.exp_fit, time_s=30, tail_min_us='auto')
\end{lstlisting}
\noindent \DIFadd{estimates the background rates in windows of 30~s
using the default iterative algorithm for choosing the
fitting threshold (}\nameref{sec:bg_intro}\DIFadd{). %DIF > PLOS: remove section and use nameref
Beginner users can simply use the previous command and
proceed to burst search (}\nameref{sec:burstsearch}\DIFadd{). %DIF > PLOS: remove section and use nameref
For more advanced users, this section provides details on
the different background estimation and plotting functions
provided by FRETBursts.
}
\DIFadd{As a start, we show how to estimate the background }\DIFaddend every 30~s,
using a \DIFdelbegin \DIFdel{minimal }\DIFdelend \DIFaddbegin \DIFadd{fixed }\DIFaddend inter-photon delay \DIFdelbegin \DIFdel{fixed }\DIFdelend threshold of 2~ms
\DIFdelbegin \DIFdel{for the all photon streams, the corresponding command is}\DIFdelend \DIFaddbegin \DIFadd{(the same for all the photon streams)}\DIFaddend :
\begin{lstlisting}
d.calc_bg(bg.exp_fit, time_s=30, tail_min_us=2000)
\end{lstlisting}
The first argument (\verb|bg.exp_fit|) is the function used to fit the
background rate for each photon stream (see section~\nameref{sec:bg_intro}).
The function
\verb|bg.exp_fit| estimates the background using a maximum likelihood estimation
(MLE) of the delays distribution.
The second argument, \verb|time_s|, is the duration of the
\textit{background period} (section~\nameref{sec:bg_intro}) and the third, \verb|tail_min_us|,
is the minimum inter-photon delay to use when fitting the distribution to the specified model function.
To use different thresholds for each photon stream we pass a
tuple (i.e. a comma-separated list of values, \href{https://docs.python.org/3.5/tutorial/datastructures.html#tuples-and-sequences}{link}) instead of a scalar.
The recommended approach is however automating the choice of threshold using
\verb|tail_min_us='auto'| using an heuristic algorithm which is described in
\textit{Background estimation} section of the μs-ALEX tutorial
(\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/FRETBursts%20-%20us-ALEX%20smFRET%20burst%20analysis.ipynb#Background-estimation}{link}).
Finally, it is possible to use a slower but rigorous approach for finding the optimal
threshold as described in~\nameref{sec:bg_opt_th}. % SI_link
FRETBursts provides two kinds of plots to represent the background. One shows the histograms
of inter-photon delays compared to the fitted exponential distribution, shown in
figure~\ref{fig:bg_dist_all}) (see section~\nameref{sec:bg_intro} for details on the inter-photon distribution).
This plot is created with the command:
\begin{lstlisting}
dplot(d, hist_bg, period=0)
\end{lstlisting}
This command reflects the general form of plotting commands in FRETBursts
as described in~\nameref{sec:plotting}. % SI_link
Here we only note that the argument \verb|period| is an integer specifying the background
period to be plotted (when omitted, the default is 0, i.e. the first period).
Figure~\ref{fig:bg_dist_all} allows to quickly identify pathological cases where the
background fitting procedure returns unreasonable values.
The second background-related plot represents a timetrace of background rates,
as shown in figure~\ref{fig:bg_timetrace}. This plot allows monitoring background rate variations
occurring during the measurement and is obtained with the command:
\begin{lstlisting}
dplot(d, timetrace_bg)
\end{lstlisting}
Normally, samples should have a fairly constant background rate as a function of time
as in figure~\ref{fig:bg_timetrace}(a). However, sometimes, non-ideal
experimental conditions can yield a time-varying background rate, as illustrated in
figure~\ref{fig:bg_timetrace}(b).
A possible reason for the observed behavior could be buffer evaporation from an open sample
\DIFdelbegin \DIFdel{or poorly }\DIFdelend \DIFaddbegin \DIFadd{(we strongly recommend using a }\DIFaddend sealed
observation chamber \DIFaddbegin \DIFadd{whenever possible)}\DIFaddend . Additionally,
cover-glass impurities can contribute to the background.
These impurities tend to bleach on timescales of minutes resulting in
background variations during the course of the measurement.
\paragraph*{Python details}
The estimated background rates are stored in the \verb|Data| attributes
\verb|bg_dd|, \verb|bg_ad| and \verb|bg_aa|, corresponding to photon
streams \verb|Ph_sel(Dex='Dem')|, \verb|Ph_sel(Dex='Aem')| and \verb|Ph_sel(Aex='Aem')|
respectively.
These attributes are lists of arrays (one array per excitation spot).
The arrays contain the estimated background rates in the different time windows
(background periods).
Additional background fitting functions (e.g. least-square fitting of inter-photon delay
histogram) are available in \verb|bg| namespace
(i.e. the \verb|background| module,
\href{http://fretbursts.readthedocs.org/en/latest/background.html}{link}).
\subsection*{Burst Search}
\label{sec:burstsearch}
%\subsubsection*{Burst Search in FRETBursts}
%\label{sec:burstsearch_code}
Following background estimation, burst search is the next step of
the analysis.
In FRETBursts, a standard burst search using a single photon stream
(see section~\nameref{sec:burstsearch_intro}) is performed by calling the
\verb|Data.burst_search| method
(\href{http://fretbursts.readthedocs.org/en/latest/data_class.html#fretbursts.burstlib.Data.burst_search}{link}).
For example, the following command:
\begin{lstlisting}
d.burst_search(F=6, m=10, ph_sel=Ph_sel('all'))
\end{lstlisting}
\DIFaddbegin \noindent \DIFaddend performs a burst search on all photons
(\verb|ph_sel=Ph_sel('all')|), with a count rate threshold equal to 6 times the
local background rate (\verb|F=6|), using 10 consecutive photons to compute the
local count rate (\verb|m=10|).
A different photon stream, threshold ($F$) or number of photons $m$ can be selected
by passing different values.
These parameters are good general-purpose starting point for smFRET analysis
but can they can be adjusted if needed.
Note that the previous burst search does not perform any burst size selection
(however, by definition, the minimum bursts size is effectively $m$).
An additional parameter $L$ can be passed to impose a minimum burst
size before any correction.
However, it is recommended to select bursts only after \DIFdelbegin \DIFdel{background corrections
are applied}\DIFdelend \DIFaddbegin \DIFadd{applying background
corrections}\DIFaddend , as discussed in the next section~\nameref{sec:burstsel}.
It might sometimes be useful to specify a fixed photon-rate threshold, instead
of a threshold depending on the background rate, as in the previous example. In
this case, instead of $F$, the argument \verb|min_rate_cps| can be used to
specify the threshold (in counts-per-second). For example, a burst search with
a 50~kcps threshold is performed as follows:
\begin{lstlisting}
d.burst_search(min_rate_cps=50e3, m=10,
ph_sel=Ph_sel('all'))
\end{lstlisting}
Finally, to perform a DCBS burst search (or in general an AND gate burst search,
see section~\nameref{sec:burstsearch_intro}) we use the function
\verb|burst_search_and_gate|
(\href{http://fretbursts.readthedocs.org/en/latest/plugins.html#fretbursts.burstlib_ext.burst_search_and_gate}{link}),
as illustrated in the following example:
\begin{lstlisting}
d_dcbs = bext.burst_search_and_gate(d, F=6, m=10)
\end{lstlisting}
The last command puts the burst search results in a new copy of the
\verb|Data| variable \verb|d|
(in this example \DIFdelbegin \DIFdel{, }\DIFdelend the copy is called \verb|d_dcbs|).
Since FRETBursts shares the timestamps and detectors arrays between
different copies of \verb|Data| objects, the memory usage is minimized, even when
several copies are created.
\paragraph*{Python details}
Note that, while \DIFdelbegin %DIFDELCMD < \verb|.burst_search()| %%%
\DIFdelend \DIFaddbegin \verb|d.burst_search()| \DIFaddend is a method of \verb|Data|,
\DIFdelbegin %DIFDELCMD < \verb|burst_search_and_gate| %%%
\DIFdelend \DIFaddbegin \verb|bext.burst_search_and_gate()| \DIFaddend is a function in the \verb|bext| module
taking a \verb|Data| object as a first argument and returning a new
\verb|Data| object.
The function \verb|burst_search_and_gate| accepts optional arguments,
\verb|ph_sel1| and \verb|ph_sel2|, whose default values correspond to the
classical DCBS photon stream selection (see section~\nameref{sec:burstsearch_intro}).
These arguments can be specified to select different photon streams than those used in
a classical DCBS.
The \verb|bext| module (\href{http://fretbursts.readthedocs.org/en/latest/plugins.html}{link})
collects ``plugin'' functions that provides additional algorithms
for processing \verb|Data| objects.
\subsection*{Bursts Corrections}
\label{sec:corrcoeff}
In μs-ALEX, there are 3 important correction parameters: $\gamma$-factor,
donor leakage into the acceptor channel
and acceptor direct excitation by the donor excitation laser~\cite{Lee_2005}.
These corrections can be applied to burst data by simply assigning values
to the respective \verb|Data| attributes:
\begin{lstlisting}
d.gamma = 0.85
d.leakage = 0.15
d.dir_ex = 0.08
\end{lstlisting}
These attributes can be assigned either before or after the burst search. In the
latter case, existing burst data is automatically updated using the new
correction parameters.
These correction factors can be used to display corrected FRET distributions.
However, when the goal is to fit the FRET efficiency of sub-populations,
it is simpler to fit the background-corrected
PR histogram and then correct the population-level PR value (see SI in~\cite{Lee_2005}).
Correcting PR of each population (instead of correcting the data in each burst)
avoids distortion of the FRET distribution and keeps peaks of
static FRET subpopulations closer to the ideal \DIFdelbegin \DIFdel{Binomial }\DIFdelend \DIFaddbegin \DIFadd{binomial }\DIFaddend statistics~\cite{Gopich_2007}.
FRETBursts implements the correction formulas for $E$ and $S$ in the functions
\verb|fretmath.correct_E_gamma_leak_dir| and \verb|fretmath.correct_S|
(\href{http://fretbursts.readthedocs.org/en/latest/fretmath.html}{link}).
A derivation of these correction formulas (using computer-assisted algebra)
can be found online as an interactive notebook (\href{http://nbviewer.jupyter.org/github/tritemio/notebooks/blob/master/Derivation%20of%20FRET%20and%20S%20correction%20formulas.ipynb}{link}).
\subsection*{Burst Selection}
\label{sec:burstsel}
After burst search, it is common to select bursts according to different
criteria. One of the most common is burst size.
For instance, to select bursts with more than 30 photons detected during the donor excitation
(computed after background correction), we use following command:
\begin{lstlisting}
ds = d.select_bursts(select_bursts.size, th1=30)
\end{lstlisting}
The previous command creates a new \verb|Data| variable (\verb|ds|) containing
the selected bursts. \verb|th1| defines the lower bound for burst size, while
\verb|th2| defines the upper bound (when not specified, as in the previous example,
the upper bound is $+\infty$).
As before, the new object (\verb|ds|) will share the photon data
arrays with the original object (\verb|d|) in order to minimize the amount
of used memory.
The first argument of \verb|select_bursts|
(\href{http://fretbursts.readthedocs.org/en/latest/data_class.html#burst-selection-methods}{link})
is a python function implementing the ``selection rule'' (\verb|select_bursts.size| in this example);
all remaining arguments (only \verb|th1| in this case) are parameters of the selection rule.
The \verb|select_bursts| module
(\href{http://fretbursts.readthedocs.org/en/latest/burst_selection.html}{link})
contains numerous built-in selection functions
(\href{http://fretbursts.readthedocs.org/en/latest/burst_selection.html#module-fretbursts.select_bursts}{link}).
For example,
\verb|select_bursts.ES|
is used to select a region on the E-S ALEX histogram,
\verb|select_bursts.width|
to select bursts based on their duration.
New custom criteria can be readily implemented by defining a new selection function,
which requires only a couple of lines of code (see the
\verb|select_bursts| module's source code for examples,
\href{https://github.com/tritemio/FRETBursts/blob/master/fretbursts/select_bursts.py}{link}).
Finally, different criteria can be combined sequentially.
For example, with the following commands:
\begin{lstlisting}
ds = d.select_bursts(select_bursts.size,
th1=50, th2=200)
dsw = ds.select_bursts(select_bursts.width,
th1=0.5e-3, th2=3e-3)
\end{lstlisting}
\DIFaddbegin \noindent \DIFaddend bursts in \verb|dsw|
will have sizes between 50 and 200 photons, and duration between 0.5 and 3~ms.
\paragraph*{Burst Size Selection}
In the previous section, we selected bursts by size, using only
photons detected in both D and A channels during D excitation (i.e. \DIFdelbegin \DIFdel{Dex }\DIFdelend \DIFaddbegin \DIFadd{D\textsubscript{ex} }\DIFaddend photons),
as in eq.~\ref{eq:burstsize_dex}.
Alternatively, a threshold on the burst size computed including all photons
can be applied by adding $n_{aa}$ to the burst size (see eq.~\ref{eq:burstsize_allph}).
This is achieved
by passing \verb|add_naa=True| to the selection function.
The complete selection command is:
\begin{lstlisting}
ds = d.select_bursts(select_bursts.size,
th1=30, add_naa=True)
\end{lstlisting}
\DIFdelbegin %DIFDELCMD < \noindent %%%
\DIFdelend The result of this selection is plotted in figure~\ref{fig:alex_jointplot}.
When \verb|add_naa| is not specified,
as in the previous section, the default is \verb|add_naa=False|
(i.e. compute size using only \DIFdelbegin \DIFdel{Dex }\DIFdelend \DIFaddbegin \DIFadd{D\textsubscript{ex} }\DIFaddend photons).
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.7\columnwidth]{figures/alex_jointplot/alex_jointplot}
\caption{\label{fig:alex_jointplot} \textbf{E-S histogram showing FRET, D-only and A-only populations.}
A 2-D ALEX histogram and marginal E and S histograms for a 40-bp dsDNA
with D-A distance of 17 bases (Donor dye: ATTO550, Acceptor dye: ATTO647N).
Bursts are selected with a size-threshold of 30 photons, including \DIFdelbeginFL \DIFdelFL{Aex }\DIFdelendFL \DIFaddbeginFL \DIFaddFL{A\textsubscript{ex} }\DIFaddendFL photons.
The plot is obtained with \texttt{alex\_jointplot(ds)}. The 2D E-S distribution plot (join plot)
is an histogram with hexagonal bins, which reduce the binning artifacts (compared to square bins)
and naturally resembles a scatter-plot when the burst density is low
\DIFaddbeginFL \DIFaddFL{(see }\nameref{sec:plotting}\DIFaddFL{)}\DIFaddendFL .
Three populations are visible: FRET population (middle), D-only population (top left) and
A-only population (bottom, $S < 0.2$). Compare with figure~\ref{fig:alex_jointplot_fretsel}
where the FRET population has been isolated.%
}
\end{center}
\end{figure}
Another important parameter for defining the burst size is the $\gamma$-factor, i.e.
the imbalance between the donor and the acceptor channel signals. As noted in
section~\nameref{sec:burstsizeweights}, the $\gamma$-factor is
used to compensate bias for the different fluorescence quantum yields of the D and A
fluorophores as well as the different photon-detection efficiencies of the D and A channels.
When $\gamma$ is significantly different from 1, neglecting its effect on burst size leads to
over-representing (in terms of number of bursts) one FRET population versus the others.
When the $\gamma$ factor is known \DIFaddbegin \DIFadd{(and $\ne 1$)}\DIFaddend , a more unbiased selection of different FRET
populations can be achieved passing the argument \verb|gamma| to the
selection function:
\begin{lstlisting}
ds = d.select_bursts(select_bursts.size,
th1=15, gamma=0.65)
\end{lstlisting}
When not specified, $\gamma=1$ is assumed.
\DIFdelbegin %DIFDELCMD <
%DIFDELCMD < %%%
\DIFdelend For more details on burst size selection, see the
\verb|select_bursts.size| documentation
(\href{http://fretbursts.readthedocs.org/en/latest/burst_selection.html#fretbursts.select_bursts.size}{link}).
\paragraph*{Python details}
\DIFdelbegin \DIFdel{To }\DIFdelend \DIFaddbegin \DIFadd{The method to }\DIFaddend compute $\gamma$-corrected burst sizes (with
or without addition of \verb|naa|)
\DIFdelbegin \DIFdel{the method }\DIFdelend \DIFaddbegin \DIFadd{is }\DIFaddend \verb|Data.burst_sizes|
(\href{http://fretbursts.readthedocs.org/en/latest/data_class.html#fretbursts.burstlib.Data.burst_sizes}{link})\DIFdelbegin \DIFdel{is used}\DIFdelend .
\paragraph*{Select the FRET Populations}
In smFRET-ALEX experiments, in addition to one or more FRET populations, there are always
donor-only (D-only) and acceptor-only (A-only) populations.
In most cases, these additional populations are not of interest and need to be filtered out.
In principle, using the E-S representation, D-only and A-only bursts
can be excluded by selecting bursts within a range of $S$ values (e.g. S=0.2-0.8).
This approach, however, simply truncates the burst distribution with arbitrary
thresholds and is therefore not recommended for quantitative assessment of FRET
populations.
An alternative approach consists in applying two selection filters sequentially.
First, the A-only population is filtered out
by applying a threshold on the number of photons during D excitation (\DIFdelbegin \DIFdel{Dex}\DIFdelend \DIFaddbegin \DIFadd{D\textsubscript{ex}}\DIFaddend ).
Second, the D-only population is filtered out by applying a threshold on
the number of A photons during A excitation (\DIFdelbegin \DIFdel{AemAex}\DIFdelend \DIFaddbegin \DIFadd{A\textsubscript{ex}A\textsubscript{em}}\DIFaddend ).
The commands for these combined selections are:
\begin{lstlisting}
ds1 = d.select_bursts(select_bursts.size, th1=15)
ds2 = ds1.select_bursts(select_bursts.naa, th1=15)
\end{lstlisting}
Here, \DIFaddbegin \DIFadd{the }\DIFaddend variable \verb|ds2| contains the combined burst selection.
Figure~\ref{fig:alex_jointplot_fretsel} shows the resulting pure FRET
population obtained with the previous selection.
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.7\columnwidth]{figures/alex_jointplot_fretsel/alex_jointplot_fretsel}
\caption{\label{fig:alex_jointplot_fretsel}
\textbf{E-S histogram after filtering out D-only and A-only populations.}
2-D ALEX histogram after selection of FRET population
using the composition of two burst selection filters:
(1) selection of bursts with counts in \DIFdelbeginFL \DIFdelFL{Dex }\DIFdelendFL \DIFaddbeginFL \DIFaddFL{D\textsubscript{ex} }\DIFaddendFL stream larger than 15;
(2) selection of bursts with counts in \DIFdelbeginFL \DIFdelFL{AemAex }\DIFdelendFL \DIFaddbeginFL \DIFaddFL{A\textsubscript{ex}A\textsubscript{em} }\DIFaddendFL stream larger than 15.
Compare to figure~\ref{fig:alex_jointplot} where all burst populations
(FRET, D-only and A-only) are reported.%
}
\end{center}
\end{figure}
\subsection*{Population Analysis}
\label{sec:fretfit}
Typically, after bursts selection, E or S histograms are fitted to a model.
FRETBursts \verb|mfit| module allows fitting histograms of bursts quantities
(i.e. E or S) with arbitrary models. In this context, a model is an object
specifying a function, the parameters varied during the fit
and optional constraints for these parameters. This concept of model
is taken from \textit{lmfit}~\cite{lmfit}, the underlying library used by
FRETBursts to perform the fits.
Models can be created from arbitrary functions.
\DIFdelbegin \DIFdel{By default,
FRETBursts allows using predefined }\DIFdelend \DIFaddbegin \DIFadd{FRETBursts includes predefined (i.e. built-in) }\DIFaddend models
such as 1 to 3 Gaussian peaks or 2-Gaussian connected by a \DIFdelbegin \DIFdel{``bridge''.
}\DIFdelend \DIFaddbegin \DIFadd{flat plateau.
The latter is an empirical model that
can be used to more accurately fit the center values of two populations
when the peaks are connected by intermediate-FRET bursts
(for the analytical definition of this function see the documentation,
}\href{http://fretbursts.readthedocs.io/en/latest/mfit.html#fretbursts.mfit.factory_two_gaussians}{link}\DIFadd{).
}\DIFaddend Built-in models are created by calling a corresponding factory function
(\DIFdelbegin \DIFdel{names starting }\DIFdelend \DIFaddbegin \DIFadd{whose names start }\DIFaddend with \verb|mfit.factory_|) which initializes the parameters
with values and constraints suitable for E and S histograms fits
\DIFdelbegin \DIFdel{.
}\DIFdelend (see \textit{Factory Functions} documentation,
\href{http://fretbursts.readthedocs.org/en/latest/mfit.html#model-factory-functions}{link}).
As an example, we \DIFaddbegin \DIFadd{can }\DIFaddend fit the E histogram of bursts in the
\verb|ds| variable with two Gaussian peaks with the following command:
\begin{lstlisting}
bext.bursts_fitter(ds, 'E', binwidth=0.03,
model=mfit.factory_two_gaussians())
\end{lstlisting}
Changing \verb|'E'| with \verb|'S'| will fit the S histogram instead.
The \verb|binwidth| argument specifies the histogram bin width and
the \verb|model| argument defines which model shall be used for
fitting.
All fitting results (including best fit values, uncertainties, etc...),
are stored in the \verb|E_fitter| (or \verb|S_fitter|)
attributes of the \verb|Data| variable (named \verb|ds| here).
To print a comprehensive summary of the fit results, including
uncertainties, reduced $\chi^2$ and correlation between parameters,
\DIFdelbegin \DIFdel{the we }\DIFdelend \DIFaddbegin \DIFadd{we can }\DIFaddend use the following command:
\begin{lstlisting}
fit_res = ds.E_fitter.fit_res[0]
print(fit_res.fit_report())
\end{lstlisting}
Finally, to plot the fitted model together with the FRET histogram,
as shown in figure~\ref{fig:histfit}, we pass the parameter \verb|show_model=True|
to the \verb|hist_fret| function
\DIFdelbegin \DIFdel{as follows
(seesection}\DIFdelend \DIFaddbegin \DIFadd{(see}\DIFaddend ~\nameref{sec:plotting} for an introduction to plotting in FRETBursts):
\begin{lstlisting}
dplot(ds, hist_fret, show_model=True)
\end{lstlisting}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.49\columnwidth]{figures/hist_fit/hist_fit}
\caption{\label{fig:histfit} \textbf{FRET histogram fitted with two Gaussians.}
Example of a FRET histogram fitted with a 2-Gaussian model.
After performing the fit (see main text), the plot is generated
with \texttt{dplot(ds, hist\_fret, show\_model=True)}.%
}
\end{center}
\end{figure}
For more examples on fitting bursts data and plotting results, refer to the
fitting section of the μs-ALEX notebook (\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/FRETBursts%20-%20us-ALEX%20smFRET%20burst%20analysis.ipynb#FRET-fit:-in-depth-example}{link}),
the \textit{Fitting Framework} section of the documentation
(\href{http://fretbursts.readthedocs.org/en/latest/fit.html}{link})
as well as the documentation for \verb|bursts_fitter| function
(\href{http://fretbursts.readthedocs.org/en/latest/plugins.html#fretbursts.burstlib_ext.bursts_fitter}{link}).
\paragraph*{Python details}
Models returned by FRETBursts's factory functions (\verb|mfit.factory_*|)
are \verb|lmfit.Model| objects (\href{https://lmfit.github.io/lmfit-py/model.html}{link}).
Custom models can be created by calling \verb|lmfit.Model| directly.
When an \verb|lmfit.Model| is fitted, it returns a \verb|ModelResults| object
(\href{https://lmfit.github.io/lmfit-py/model.html#the-modelresult-class}{link}),
which contains all information related to the fit (model, data,
parameters with best values and uncertainties) and useful methods to operate on fit results.
FRETBursts puts a \verb|ModelResults| object of each excitation spot in the list
\verb|ds.E_fitter.fit_res|.
For instance, to obtain the reduced $\chi^2$ value of the E histogram fit in a
single-spot measurement \verb|d|, we use the following command:
\begin{lstlisting}
d.E_fitter.fit_res[0].redchi
\end{lstlisting}
Other useful attributes are \verb|aic| and \verb|bic| which contain
\DIFaddbegin \DIFadd{statistics for }\DIFaddend the Akaike information criterion (AIC)\DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite{akaike_new_1974}
}%DIFAUXCMD
}\DIFaddend and the Bayes Information criterion (BIC)\DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite{schwarz_estimating_1978}}%DIFAUXCMD
}\DIFaddend .
AIC and BIC \DIFdelbegin \DIFdel{allow comparing different models and
selecting the most appropriate for the dataat hand.
}\DIFdelend \DIFaddbegin \DIFadd{are general-purpose statistical criteria for comparing the
suitability of multiple non-nested models according to the data.
By penalizing models with higher number of parameters, these criteria
strike a balance between the need of achieving high goodness of fit
with the need of keeping the model complexity low to avoid overfitting.
}\DIFaddend
Examples of definition and modification of fit models are provided in
the aforementioned μs-ALEX notebook
(\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/FRETBursts%20-%20us-ALEX%20smFRET%20burst%20analysis.ipynb#FRET-fit:-in-depth-example}{link}).
Users can also refer to the comprehensive lmfit's documentation
(\href{http://lmfit.github.io/lmfit-py/}{link}).
\DIFdelbegin \section*{\DIFdel{Implementing Burst Variance Analysis}}
%DIFAUXCMD
\DIFdelend \DIFaddbegin \subsection*{\DIFadd{FRET Dynamics}}
\label{sec:dynamics}
\DIFaddend
\DIFdelbegin %DIFDELCMD < %DIFDELCMD < \label{sec:bva}%%%
%DIFDELCMD < %%%
\DIFdel{In this section, we describe how to implement burst variance analysis (BVA)
as described in~\mbox{%DIFAUXCMD
\cite{Torella_2011}}%DIFAUXCMD
.
FRETBursts provides well-tested, general-purpose functions for timestamps and burst data
manipulation and therefore simplifies implementing custom burst analysis algorithms such as BVA.
}%DIFDELCMD <
%DIFDELCMD < %%%
\subsection*{\DIFdel{BVA Overview}}
%DIFAUXCMD
\DIFdelend Single-molecule FRET histograms show more information than just mean FRET efficiencies.
While in general the presence of several peaks clearly indicates the existence of
multiple subpopulations, a single peak cannot a priori be associated with
a single population defined by a unique FRET efficiency without further analysis\DIFdelbegin \DIFdel{(such as, for instance, shot-noise analysis~\mbox{%DIFAUXCMD
\cite{Nir_2006,Antonik2006}}%DIFAUXCMD
)}\DIFdelend .
\DIFdelbegin \DIFdel{The FRET histogram of a single FRET population
has a minimum width set by shot noise
}\DIFdelend \DIFaddbegin \DIFadd{Shot-noise analysis~\mbox{%DIFAUXCMD
\cite{Nir_2006} }%DIFAUXCMD
or probability
distribution analysis (PDA)~\mbox{%DIFAUXCMD
\cite{Antonik2006,kalinin_probability_2007}
}%DIFAUXCMD
allow to compute the minimum width of a static FRET population
}\DIFaddend (i.e. \DIFdelbegin \DIFdel{the width is }\DIFdelend caused by the statistics of discrete photon-detection events).
\DIFdelbegin \DIFdel{FRET distributions broader than the shot noise limit,
can be ascribed to either a static mixture of species with slightly different
FRET efficiencies, or to }\DIFdelend \DIFaddbegin \DIFadd{Typically, several mechanisms
contribute to the broadening of the experimental FRET peak
beyond the shot-noise limit. These include heterogeneities in the sample
resulting in a distribution of Förster radiuses,
or actual conformational changes giving rise to }\DIFaddend a \DIFdelbegin \DIFdel{specie undergoing dynamic transitions (e.
g. interconversion between multiple states, diffusion in a continuum of conformations, binding-unbinding events, etc.
).
When the single peak of a FRET distribution is wider than predicted from shot-noise, it is not possible to discriminate between the static and dynamic case without further analysis .
}\DIFdelend \DIFaddbegin \DIFadd{distribution
of D-A distances~\mbox{%DIFAUXCMD
\cite{sisamakis_accurate_2010}}%DIFAUXCMD
.
}
\DIFadd{Gopich and Szabo developed an elegant analytical model
for the FRET distribution of $M$ interconverting states
based on superposition of Gaussian peaks~\mbox{%DIFAUXCMD
\cite{gopich_fret_2010}}%DIFAUXCMD
.
Unfortunately, the method is not of straightforward application for
freely-diffusing data as it requires a special selection
criterion for filtering bursts with quasi-Poisson rates.
Santoso~\mbox{%DIFAUXCMD
\cite{santoso_probing_2009} }%DIFAUXCMD
and Kalinin~\mbox{%DIFAUXCMD
\cite{Kalinin2010}
}%DIFAUXCMD
extended the PDA approach to estimate conversion rates between different
states by comparing FRET histograms as a function of the time-bin size.
In addition, Gopich and Szabo~\mbox{%DIFAUXCMD
\cite{Gopich2009, gopich_theory_2011} }%DIFAUXCMD
developed
a related method to compute conversion rates using
a likelihood function which depends on photon timestamps (overcoming
the time binning and FRET histogramming step and directly applicable
to freely-diffusing data).
In case of measurement including lifetime, the multiparameter fluorescence
detection (MFD) method allows to identify dynamics from the deviation
from the linear relation between lifetime and E~\mbox{%DIFAUXCMD
\cite{sisamakis_accurate_2010}}%DIFAUXCMD
.
Hoffman~\mbox{%DIFAUXCMD
\cite{hoffmann_quantifying_2011} }%DIFAUXCMD
proposed a method
called RASP (recurrence analysis of single particles) to extend
the timescale of detectable kinetics.
Hoffman computes the probability that two nearby bursts are due to
the same molecule and therefore allows setting a time-threshold
for considering consecutive bursts as the same single-molecule event.
}
\DIFadd{Other interesting approaches include combining smFRET and FCS
for detecting and quantify kinetics on timescales much shorter
than the diffusion
time~\mbox{%DIFAUXCMD
\cite{laurence_correlation_2007,torres_measuring_2007,nettels_unfolded_2008}}%DIFAUXCMD
.
In addition, Bayes-based methods have been proposed to fit static
populations~\mbox{%DIFAUXCMD
\cite{devore_classic_2012,murphy_bayesian_2014}}%DIFAUXCMD
, or to study dynamics~\mbox{%DIFAUXCMD
\cite{kou_bayesian_2005}}%DIFAUXCMD
.
}
\DIFadd{Finally, two related methods for discriminating between static heterogeneity
and sub-millisecond dynamics are Burst Variance Analysis
(BVA) proposed by Torella~\mbox{%DIFAUXCMD
\cite{Torella_2011} }%DIFAUXCMD
and
kernel density distribution estimator (2CDE) proposed by
Tomov~\mbox{%DIFAUXCMD
\cite{Tomov_2012}}%DIFAUXCMD
. The BVA method is described in the next section.
The 2CDE method, which has been implemented in FRETBursts, computes local
photon rates from timestamps within bursts using
Kernel Density Estimation (KDE)
(FRETBursts includes general-purpose functions
to compute KDE of photon timestamps in the }\verb|phrates| \DIFadd{module,
(}\href{http://fretbursts.readthedocs.io/en/latest/phrates.html}{link}\DIFadd{)).
From time variations of local rates is possible to
detect the occurrence of dynamics. In particular the 2CDE method
builds, for each burst, a quantity $(E)_D$ (or $(1-E)_A$) which is equal
to the burst average $E$ when no dynamics is present, but it is biased
toward an higher (or lower) value in presence of dynamics. From these
quantities a burst ``estimator''
(called FRET-2CDE) is derived. For a user the 2CDE method consists
in plotting the 2-D histogram of $E$ versus FRET-2CDE
in assessing the vertical position of the various populations:
populations centered around FRET-2CDE=10 have
no dynamics while population biased towards higher FRET-2CDE values
have dynamics.
}
\DIFadd{The BVA and 2CDE methods are implemented
in two notebooks included with FRETBursts
(}\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/Example%20-%20Burst%20Variance%20Analysis.ipynb}{BVA link},
\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/Example%20-%202CDE%20Method.ipynb}{2CDE link}\DIFadd{).
To use them, a user needs to download the relevant notebook
and run the anaysis therein.
The other methods mentioned in this section are not currently
implemented in FRETBursts.
However, users can implement their additional favorite method
taking advantage of FRETBursts functions for burst analysis
and timestamps/bursts manipulation.
To facilitate this task, in the next section,
we show how to perform low-level analysis of timestamps and bursts data
by implementing the BVA method from scratch.
An additional example showing how to split bursts in constant time-bins
can be found in the respective FRETBursts notebook
(}\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/Example%20-%20Working%20with%20timestamps%20and%20bursts.ipynb}{link}\DIFadd{).
These examples serve as a guide for implementing new methods.
We welcome researchers willing to implement new methods to ask questions
on GitHub or on the mailing list.
We also encourage sharing eventual new methods implemented in FRETBursts
for the benefit the entire community.
}
\section*{\DIFadd{Implementing Burst Variance Analysis}}
\label{sec:bva}
\DIFadd{In this section, we describe how to implement burst variance analysis (BVA)
as described in~\mbox{%DIFAUXCMD
\cite{Torella_2011}}%DIFAUXCMD
.
FRETBursts provides well-tested, general-purpose functions for timestamps and burst data
manipulation and therefore simplifies implementing custom burst analysis algorithms such as BVA.
}
\subsection*{\DIFadd{BVA Overview}}
\DIFaddend The BVA method has been developed to \DIFdelbegin \DIFdel{address this issue, namely identifying }\DIFdelend \DIFaddbegin \DIFadd{identify }\DIFaddend the presence of dynamics
in FRET distributions~\cite{Torella_2011},
and has been successfully applied to identify biomolecular processes with
dynamics on the millisecond time-scale~\cite{Torella_2011, Robb_2013}.
The basic idea behind BVA is to subdivide bursts into contiguous burst chunks (sub-bursts)
comprising a fixed number $n$ of photons,
and to compare the empirical variance of acceptor counts of all sub-bursts in a burst,
with the theoretical shot-noise-limited variance.
An empirical variance of sub-bursts larger than the shot-noise limited value indicates
the presence of dynamics. Since the estimation of the sub-bursts variance is affected
by uncertainty, BVA analysis provides and indication of an higher or lower probability
of observing dynamics.
In a FRET (sub-)population originating from a single static FRET efficiency,
the sub-bursts acceptor counts $n_a$ can be modeled as a binomial-distributed random variable
$N_a \sim \operatorname{B}(n, E_p)$, where $n$ is the number of photons in each sub-burst and
$E_p$ is the estimated population proximity-ratio (PR).
Note that we can use the PR because, regardless of the molecular FRET efficiency,
the detected counts are partitioned between donor and acceptor channels according to
a binomial distribution with success probability equal to the PR.
The only approximation done here is neglecting the presence of background
(a reasonable approximation since the backgrounds counts are in general a
very small fraction of the total counts).
We refer the interested reader to~\cite{Torella_2011} for further discussion.
If $N_a$ follows a binomial distribution, the random variable $E_{\textrm{sub}} = N_a/n$,
has a standard deviation reported in eq.~\ref{eq:binom_std}.
\begin{equation}
\label{eq:binom_std}
\operatorname{Std}(E_{\textrm{sub}}) = \left( \frac{E_p\,(1 - E_p)}{n} \right)^{1/2}
\end{equation}
BVA analysis consists of four steps: 1) dividing bursts into consecutive sub-bursts
containing a constant number of consecutive photons~\textit{n}, 2) computing the PR
of each sub-burst, 3) calculating the empirical standard deviation ($s_E$) of sub-bursts
PR in each burst, and 4) comparing $s_E$ to the expected standard deviation
of a shot-noise-limited distribution~(eq.~\ref{eq:binom_std}).
If, as in figure~\ref{fig:bva_static}, the observed FRET efficiency distribution
originates from a static mixture of sub-populations (of different
non-interconverting molecules) characterized by distinct FRET efficiencies,
$s_E$ of each burst is only affected by shot-noise and will follow the expected
standard deviation curve based on eq.~\ref{eq:binom_std}.
Conversely, if the observed distribution originates from biomolecules belonging to a single specie,
which interconverts between different FRET sub-populations (over times comparable to the diffusion
time), as in figure~\ref{fig:bva_dynamic}, $s_E$ of each burst will be larger than the expected
shot-noise-limited standard deviation, and will be located above the shot-noise standard
deviation curve (right panel of figure~\ref{fig:bva_dynamic}).
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.98\columnwidth]{figures/ALEX_BVA_static/ALEX_BVA_static}
\caption{\label{fig:bva_static} \textbf{BVA distribution for a static mixture sample.}
The left panel shows the E-S histogram for a mixture of single stranded DNA (20dT) and double stranded DNA (20dT-20dA) molecules in 200 mM MgCl$_2$. The right panel shows the corresponding BVA plot. Since both 20dT and 20dT-20dA are stable and have no dynamics, the BVA plots shows $s_E$ peaks lying on the static standard deviation curve (\textit{red curve}).%
}
\end{center}
\end{figure}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.98\columnwidth]{figures/ALEX_BVA_dynamic/ALEX_BVA_dynamic}
\caption{\label{fig:bva_dynamic} \textbf{BVA distribution for a hairpin sample undergoing dynamics.}
The left panel shows the E-S histogram for a single stranded DNA sample ($A_{31}$-TA, see in~\cite{Tsukanov_2013}), designed to form a transient hairpin in 400mM NaCl. The right panel shows the corresponding BVA plot. Since the transition between hairpin and open structure causes a significant change in FRET efficiency, $s_E$ lies largely above the static standard deviation curve (\textit{red curve}).%
}
\end{center}
\end{figure}
\subsection*{BVA Implementation}
The following paragraphs describe the low-level details involved in implementing the BVA using FRETBursts.
The main goal is to illustrate a real-world example of accessing and manipulating timestamps and burst data.
For a ready-to-use BVA implementation users can refer to the corresponding notebook included with FRETBursts
(\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/Example%20-%20Burst%20Variance%20Analysis.ipynb}{link}).
\paragraph*{Python details}
For BVA implementation, two photon streams are needed: all-photons during donor excitation (\DIFdelbegin \DIFdel{Dex}\DIFdelend \DIFaddbegin \DIFadd{D\textsubscript{ex}}\DIFaddend )
and acceptor photons during donor excitation (\DIFdelbegin \DIFdel{DexAem}\DIFdelend \DIFaddbegin \DIFadd{D\textsubscript{ex}A\textsubscript{em}}\DIFaddend ).
These photon stream selections are obtained by computing boolean masks as follows
(see\DIFdelbegin \DIFdel{section}\DIFdelend ~\nameref{sec:burststimes}):
\begin{lstlisting}
Dex_mask = ds.get_ph_mask(ph_sel=Ph_sel(Dex='DAem'))
DexAem_mask = ds.get_ph_mask(ph_sel=Ph_sel(Dex='Aem'))
DexAem_mask_d = AemDex_mask[Dex_mask]
\end{lstlisting}
Here, the first two variables (\verb|Dex_mask| and \verb|DexAem_mask|)
select photon from the all-photons timestamps array,
while \verb|DexAem_mask_d|, selects A-emitted photons from the
array of photons emitted during D-excitation. As shown below,
the latter is needed to count acceptor photons in burst chunks.
Next, we need to express bursts start-stop data as indexes of the D-excitation photon stream
(by default burst start-stop indexes refer to all-photons timestamps array):
\begin{lstlisting}
ph_d = ds_FRET.get_ph_times(ph_sel=Ph_sel(Dex='DAem'))
bursts = ds_FRET.mburst[0]
bursts_d = bursts.recompute_index_reduce(ph_d)
\end{lstlisting}
Here, \verb|ph_d| contains the \DIFdelbegin \DIFdel{Dex }\DIFdelend \DIFaddbegin \DIFadd{D\textsubscript{ex} }\DIFaddend timestamps, \verb|bursts| the original burst data and
\verb|bursts_d| the burst data with start-stop indexes relative to \verb|ph_d|.
Finally, with the previous variables at hand, the BVA algorithm
can be easily implemented by computing the $s_E$ quantity for each burst:
\begin{lstlisting}
n = 7
E_sub_std = []
for burst in bursts_d:
E_sub = []
startlist = range(burst.istart, burst.istop + 2 - n, n)
stoplist = [i + n for i in startlist]
for start, stop in zip(startlist, stoplist):
A_D = DexAem_mask_d[start:stop].sum()
E = A_D / n
E_sub.append(E)
E_sub_std.append(np.std(E_sub))
\end{lstlisting}
Here, \verb|n| is the BVA parameter defining the number of photons in each burst chunk.
The outer loop iterates through bursts, while the inner loop iterates through sub-bursts.
The variables \verb|startlist| and \verb|stoplist| are the list of start-stop indexes for
all sub-bursts in current burst.
In the inner loop, \verb|A_D| and \verb|E| contain the number of acceptor photons and
FRET efficiency for the current sub-burst. Finally, for each burst, the standard deviation
of \verb|E| is appended to the list \verb|E_sub_std|.
By plotting the 2D distribution of $s_E$ (i.e. \verb|E_sub_std|) versus the average (uncorrected) E
we obtain the BVA plots of figure~\ref{fig:bva_static} and~\ref{fig:bva_dynamic}.
\section*{Conclusions}
\label{sec:conclusions}
FRETBursts is an open source and openly developed (see~\nameref{sec:dev}) implementation % SI_link
of established smFRET burst analysis methods
made available to the single-molecule community.
It implements several novel concepts which improve the analysis results, such as
time-dependent background estimation, background-dependent burst search threshold,
burst weighting and $\gamma$-corrected burst size selection.
More importantly, FRETBursts provides a library of thoroughly-tested functions
for timestamps and burst manipulation, making it an ideal platform for
developing and comparing new analytical techniques.
We envision FRETBursts both as a state-of-the-art burst analysis
software as well as a platform for development and assessment of novel algorithms.
To underpin this envisioned role, FRETBursts is developed following modern
software engineering practices, such as DRY principle
(\href{http://en.wikipedia.org/wiki/Don\%27t_repeat_yourself}{link})
to reduce duplication and KISS principle
(\href{http://en.wikipedia.org/wiki/KISS_principle}{link})
to reduce over-engineering. Furthermore, to minimize the number software errors~\cite{Merali_2010,Soergel_2015},
we employ defensive programming~\cite{Prli__2012} which includes code readability,
unit and regression testing and continuous integration~\cite{Eglen_2016}.
Finally, being open source, any scientist can inspect the source code,
fix errors, adapt it to her own needs.
We believe that, in the single-molecule community,
standard open source software implementations, such as FRETBursts, can enhance
reliability and reproducibility of analysis and promote a faster adoption of novel methods,
while reducing the duplication of efforts among different groups.
\section*{Acknowledgments}
We thank Dr. Eyal Nir and Dr. Toma Tomov for support in the implementation of the 2CDE method \DIFdelbegin \DIFdel{.
}\DIFdelend \DIFaddbegin \DIFadd{and Dr. Achilles Kapanidis and Dr. Nicole Robb for providing
experimental data for testing the BVA implementation.
}\DIFaddend This work was supported by National Institutes of Health (NIH)
grant R01-GM95904 and R01-GM069709. Dr. Weiss discloses equity in
Nesher Technologies and intellectual property used in the research
reported here. The work at UCLA was conducted in Dr. Weiss's Laboratory.
\section*{Supporting Information}
\paragraph*{S1 Appendix.}
\label{sec:notebook}
{\bf Notebook Workflow.} A description of the notebook workflow used by FRETBursts.
\paragraph*{S2 Appendix.}
\label{sec:dev}
{\bf Development and Contributions.} A description of development philosophy and techniques
as well as how to contribute to the FRETBursts project.
\paragraph*{S3 Appendix.}
\label{sec:burststimes}
{\bf Timestamps and Burst Data.} General concepts of how timestamps and
bursts data are stored and handled in FRETBursts.
\paragraph*{S4 Appendix.}
\label{sec:plotting}
{\bf Plotting \texttt{Data}.} A description of the syntax used to perform
plots in FRETBursts \DIFaddbegin \DIFadd{and of the 2-D hexagonal-bin histogram used in E-S plots}\DIFaddend .
\paragraph*{S5 Appendix.}
\label{sec:bg_opt_th}
{\bf Background Estimation With Optimal Threshold.} A description of
the algorithm used by FRETBursts to compute the
optimal threshold for background estimation.
\paragraph*{S6 Appendix.}
\label{sec:burstweights_theory}
{\bf Burst Weights.} Theory underpinning the choice of using burst size
as weights for FRET estimation.
\nolinenumbers
\bibliography{bibliography/converted_to_latex.bib%
}
\end{document}
diff --git a/FRETBursts authorea latex/diff_928_161_fix.tex b/FRETBursts authorea latex/diff_928_161_fix.tex
new file mode 100644
index 0000000..abdf727
--- /dev/null
+++ b/FRETBursts authorea latex/diff_928_161_fix.tex
...
% Template for PLoS
%DIF LATEXDIFF DIFFERENCE FILE
%DIF DEL full_article_928.tex Tue Jun 28 13:25:24 2016
%DIF ADD full_article_161.tex Thu Jun 30 12:52:03 2016
% Version 3.1 February 2015
%
% To compile to pdf, run:
% latex plos.template
% bibtex plos.template
% latex plos.template
% latex plos.template
% dvipdf plos.template
%
% % % % % % % % % % % % % % % % % % % % % %
%
% -- IMPORTANT NOTE
%
% This template contains comments intended
% to minimize problems and delays during our production
% process. Please follow the template instructions
% whenever possible.
%
% % % % % % % % % % % % % % % % % % % % % % %
%
% Once your paper is accepted for publication,
% PLEASE REMOVE ALL TRACKED CHANGES in this file and leave only
% the final text of your manuscript.
%
% There are no restrictions on package use within the LaTeX files except that
% no packages listed in the template may be deleted.
%
% Please do not include colors or graphics in the text.
%
% Please do not create a heading level below \subsection. For 3rd level headings, use \paragraph*{}.
%
% % % % % % % % % % % % % % % % % % % % % % %
%
% -- FIGURES AND TABLES
%
% Please include tables/figure captions directly after the paragraph where they are first cited in the text.
%
% DO NOT INCLUDE GRAPHICS IN YOUR MANUSCRIPT
% - Figures should be uploaded separately from your manuscript file.
% - Figures generated using LaTeX should be extracted and removed from the PDF before submission.
% - Figures containing multiple panels/subfigures must be combined into one image file before submission.
% For figure citations, please use "Fig." instead of "Figure".
% See http://www.plosone.org/static/figureGuidelines for PLOS figure guidelines.
%
% Tables should be cell-based and may not contain:
% - tabs/spacing/line breaks within cells to alter layout or alignment
% - vertically-merged cells (no tabular environments within tabular environments, do not use \multirow)
% - colors, shading, or graphic objects
% See http://www.plosone.org/static/figureGuidelines#tables for table guidelines.
%
% For tables that exceed the width of the text column, use the adjustwidth environment as illustrated in the example table in text below.
%
% % % % % % % % % % % % % % % % % % % % % % % %
%
% -- EQUATIONS, MATH SYMBOLS, SUBSCRIPTS, AND SUPERSCRIPTS
%
% IMPORTANT
% Below are a few tips to help format your equations and other special characters according to our specifications. For more tips to help reduce the possibility of formatting errors during conversion, please see our LaTeX guidelines at http://www.plosone.org/static/latexGuidelines
%
% Please be sure to include all portions of an equation in the math environment.
%
% Do not include text that is not math in the math environment. For example, CO2 will be CO\textsubscript{2}.
%
% Please add line breaks to long display equations when possible in order to fit size of the column.
%
% For inline equations, please do not include punctuation (commas, etc) within the math environment unless this is part of the equation.
%
% % % % % % % % % % % % % % % % % % % % % % % %
%
% Please contact [email protected] with any questions.
%
% % % % % % % % % % % % % % % % % % % % % % % %
\documentclass[10pt,letterpaper]{article}
\usepackage[top=0.85in,left=2.75in,footskip=0.75in]{geometry}
% Use adjustwidth environment to exceed column width (see example table in text)
\usepackage{changepage}
% Use Unicode characters when possible
%\usepackage[utf8]{inputenc}
% textcomp package and marvosym package for additional characters
\usepackage{textcomp,marvosym}
% fixltx2e package for \textsubscript
\usepackage{fixltx2e}
% amsmath and amssymb packages, useful for mathematical formulas and symbols
\usepackage{amsmath,amssymb}
% cite package, to clean up citations in the main text. Do not remove.
\usepackage{cite}
% Use nameref to cite supporting information files (see Supporting Information section for more info)
\usepackage{nameref}
\usepackage{color}
\usepackage[colorlinks=true,
linkcolor=blue,
urlcolor=blue,
citecolor=black]{hyperref}
% line numbers
\usepackage[right]{lineno}
% ligatures disabled
\usepackage{microtype}
\DisableLigatures[f]{encoding = *, family = * }
% rotating package for sideways tables
\usepackage{rotating}
% Remove comment for double spacing
%\usepackage{setspace}
%\doublespacing
\usepackage{graphicx}
\usepackage[space]{grffile}
\usepackage{latexsym}
\usepackage{textcomp}
\usepackage{longtable}
\usepackage{multirow,booktabs}
% You can conditionalize code for latexml or normal latex using this.
\newif\iflatexml\latexmlfalse
\usepackage[utf8]{inputenc}
\usepackage[ngerman,greek,english]{babel}
%% Neutralize any \includegraphics in the document, as PLOS does not allow figures in the final submission
\makeatletter
\let\orig@includegraphics\includegraphics
\AtBeginDocument{\let\includegraphics\PLOS@ignore}
\newcommand{\PLOS@ignore}[2][]{}
\makeatother
% Text layout
\raggedright
\setlength{\parindent}{0.5cm}
\textwidth 5.25in
\textheight 8.75in
% Bold the 'Figure #' in the caption and separate it from the title/caption with a period
% Captions will be left justified
\usepackage[aboveskip=1pt,labelfont=bf,labelsep=period,justification=raggedright,singlelinecheck=off]{caption}
% Use the PLoS provided BiBTeX style
\bibliographystyle{plos2015}
% Remove brackets from numbering in List of References
\makeatletter
\renewcommand{\@biblabel}[1]{\quad#1.}
\makeatother
% Leave date blank
\date{}
% Header and Footer with logo
\usepackage{lastpage,fancyhdr,graphicx}
\usepackage{epstopdf}
\pagestyle{myheadings}
\pagestyle{fancy}
\fancyhf{}
\makeatletter
\lhead{\orig@includegraphics[width=2.0in]{PLOS-submission.eps}}
\makeatother
\rfoot{\thepage/\pageref{LastPage}}
\renewcommand{\footrule}{\hrule height 2pt \vspace{2mm}}
\fancyheadoffset[L]{2.25in}
\fancyfootoffset[L]{2.25in}
\lfoot{\sf PLOS}
%% Include all macros below
\newcommand{\lorem}{{\bf LOREM}}
\newcommand{\ipsum}{{\bf IPSUM}}
\usepackage{color}
\usepackage{listings}
\lstset{ %
backgroundcolor=\color{white}, % choose the background color
basicstyle=\footnotesize\ttfamily, % size of fonts used for the code
breaklines=true, % automatic line breaking only at whitespace
captionpos=b, % sets the caption-position to bottom
commentstyle=\color{OliveGreen}, % comment style
keywordstyle=\color{blue}, % keyword style
stringstyle=\color{black}, % string literal style
language=Python, % Set your language (you can change the language for each code-block optionally)
frame=l, %
xleftmargin=\fboxsep, %
xrightmargin=-\fboxsep, %
}
\hyphenation{smFRET}
\hyphenation{FRETBursts}
%% END MACROS SECTION
%DIF PREAMBLE EXTENSION ADDED BY LATEXDIFF
%DIF UNDERLINE PREAMBLE %DIF PREAMBLE
\RequirePackage[normalem]{ulem} %DIF PREAMBLE
\RequirePackage{color}\definecolor{RED}{rgb}{1,0,0}\definecolor{BLUE}{rgb}{0,0,1} %DIF PREAMBLE
\providecommand{\DIFaddtex}[1]{{\protect\color{blue}\uwave{#1}}} %DIF PREAMBLE
\providecommand{\DIFdeltex}[1]{{\protect\color{red}\sout{#1}}} %DIF PREAMBLE
%DIF SAFE PREAMBLE %DIF PREAMBLE
\providecommand{\DIFaddbegin}{} %DIF PREAMBLE
\providecommand{\DIFaddend}{} %DIF PREAMBLE
\providecommand{\DIFdelbegin}{} %DIF PREAMBLE
\providecommand{\DIFdelend}{} %DIF PREAMBLE
%DIF FLOATSAFE PREAMBLE %DIF PREAMBLE
\providecommand{\DIFaddFL}[1]{\DIFadd{#1}} %DIF PREAMBLE
\providecommand{\DIFdelFL}[1]{\DIFdel{#1}} %DIF PREAMBLE
\providecommand{\DIFaddbeginFL}{} %DIF PREAMBLE
\providecommand{\DIFaddendFL}{} %DIF PREAMBLE
\providecommand{\DIFdelbeginFL}{} %DIF PREAMBLE
\providecommand{\DIFdelendFL}{} %DIF PREAMBLE
%DIF END PREAMBLE EXTENSION ADDED BY LATEXDIFF
%DIF PREAMBLE EXTENSION ADDED BY LATEXDIFF
%DIF HYPERREF PREAMBLE %DIF PREAMBLE
\providecommand{\DIFadd}[1]{\texorpdfstring{\DIFaddtex{#1}}{#1}} %DIF PREAMBLE
\providecommand{\DIFdel}[1]{\texorpdfstring{\DIFdeltex{#1}}{}} %DIF PREAMBLE
%DIF END PREAMBLE EXTENSION ADDED BY LATEXDIFF
\begin{document}
\vspace*{0.35in}
% Title must be 250 characters or less.
\begin{flushleft}
{\Large
\textbf\newline{\input{title}}
}
\newline
% Insert author names, affiliations and corresponding author email (do not include titles, positions, or degrees).
\\
Antonino Ingargiola\textsuperscript{1*},
Eitan Lerner\textsuperscript{1},
SangYoon Chung\textsuperscript{1},
Shimon Weiss\textsuperscript{1},
Xavier Michalet\textsuperscript{1},
\\
\bigskip
\textbf{1} Dept. Chemistry and Biochemistry, Univ. of California Los Angeles, Los Angeles, CA, USA
\bigskip
% Use the asterisk to denote corresponding authorship and provide email address in note below.
* [email protected]
\end{flushleft}
% Please keep the abstract below 300 words
\section*{Abstract}
Single-molecule Förster Resonance Energy Transfer (smFRET) allows
probing intermolecular interactions and conformational changes in
biomacromolecules, and represents an invaluable tool for studying
cellular processes at the molecular scale. smFRET experiments can
detect the distance between two fluorescent labels (donor and acceptor)
in the 3-10~nm range. In the commonly employed confocal geometry,
molecules are free to diffuse in solution. When a molecule traverses
the excitation volume, it emits a burst of photons, which can be detected
by single-photon avalanche diode (SPAD) detectors. The intensities of
donor and acceptor fluorescence can then be related to the distance
between the two fluorophores.
While recent years have seen a growing number of contributions
proposing improvements or new techniques in smFRET data analysis,
rarely have those publications been accompanied by software implementation.
In particular, despite the widespread application of smFRET, no complete
software package for smFRET burst analysis is freely available to date.
In this paper, we introduce FRETBursts, an open source software
for analysis of freely-diffusing smFRET data.
FRETBursts allows executing all the fundamental steps of smFRET bursts
analysis using state-of-the-art as well as novel techniques,
while providing an open, robust and well-documented implementation.
Therefore, FRETBursts represents an ideal platform for comparison
and development of new methods in burst analysis.
We employ modern software engineering principles in order to
minimize bugs and facilitate long-term maintainability.
Furthermore, we place a strong focus on reproducibility by relying on
Jupyter notebooks for FRETBursts execution.
Notebooks are executable documents capturing all the steps of the
analysis (including data files, input parameters, and results) and can
be easily shared to replicate complete smFRET analyzes.
Notebooks allow beginners to execute complex workflows
and advanced users to customize the analysis for their own needs.
By bundling analysis description, code and results in a single document,
FRETBursts allows to seamless share analysis workflows
and results, encourages reproducibility and facilitates collaboration
among researchers in the single-molecule community.
% Please keep the Author Summary between 150 and 200 words
% Use first person. PLOS ONE authors please skip this step.
% Author Summary not valid for PLOS ONE submissions.
%\section*{Author Summary}
\linenumbers
\section*{Introduction}
\subsection*{Open Science and Reproducibility}
Over the past 20 years, single molecule FRET (smFRET) has grown into one of the most
useful techniques in single-molecule spectroscopy~\cite{Weiss_1999,Hohlbein_2014}.
While it is possible to extract information on sub-populations using ensemble measurements
(e.g. ~\cite{Lerner_2014,Rahamim_2015}),
smFRET unique feature is its ability to very straightforwardly resolve conformational
changes of biomolecules or measure binding-unbinding kinetics in heterogeneous
samples~\cite{Selvin_2000,Roy_2008,Schuler_2008,Sisamakis_2010,Haran_2012}.
smFRET measurements on freely diffusing molecules (the focus of this paper)
have the additional advantage, over measurements performed on immobilized molecules,
of allowing to probe molecules and processes without perturbation from surface
immobilization or additional functionalization needed for surface
attachment~\cite{Eggeling_1998,Dahan_1999}.
The increasing amount of work using freely-diffusing smFRET has motivated
a growing number of theoretical contributions to the specific topic of data
analysis~\cite{Fries_1998,Eggeling_2001,Zhang_2005,Gopich_2005,Lee_2005,Nir_2006,Antonik2006,Gopich_2007,Gopich_2008,Camley_2009,Santoso_2010,Torella_2011,Tomov_2012}.
Despite this profusion of publications, most research groups still rely on
their own implementation of a limited number of methods, with very little
collaboration or code sharing.
To clarify this statement, let us point that our own group's past smFRET papers
merely mention the use of custom-made software without additional details~\cite{Lee_2005,Nir_2006}.
Even though some of these software tools are made available upon request,
or sometimes shared publicly on websites,
it remains hard to reproduce and validate results from different groups,
let alone build upon them.
Additionally, as new methods are proposed in literature,
it is generally difficult to quantify their performance compared to other methods.
An independent quantitative assessment
would require a complete reimplementation, an effort few groups can afford.
As a result, potentially useful analysis improvements
are either rarely or slowly adopted by the community.
In contrast with other established traditions such as
sharing protocols and samples, in the domain of scientific software,
we have relegated ourselves to islands of non-communication.
From a more general standpoint, the non-availability of the code
used to produce scientific results, hinders reproducibility,
makes it impossible to review and validate the software's correctness
and prevents improvements and extensions by other scientists.
This situation, common in many disciplines,
represents a real impediment to the scientific progress.
Since the pioneering work of the Donoho group in the 90's~\cite{Buckheit_1995},
it has become evident that developing and maintaining open source scientific software
for reproducible research is a critical requirement of the modern
scientific enterprise~\cite{Ince_2012,Vihinen_2015}.
%Peer-reviewed publications describing such software are also necessary~\cite{Pradal_2013},
%although the debate is still open on the most effective model for peer-reviewing this
%class of publications~\cite{Check_Hayden_2013,Check_Hayden_2015}
%(\href{https://software-carpentry.org/blog/2015/04/quality-is-free-getting-there-isnt.html}{Willson 2015})
%(\href{https://www.mozillascience.org/effective-code-review-for-journals}{Mills 2015})
%(\href{http://ivory.idyll.org/blog/2015-we-live-in-a-bubble.html}{Brown 2015} and \href{http://ivory.idyll.org/blog/on-code-review-of-scientific-code.html}{2013}).
Other disciplines have started tackling this issue~\cite{Eglen_2016},
and even in the single-molecule field a few recent publications have provided
software for analysis of surface-immobilized experiments~\cite{McKinney_2006,Bronson_2009,Greenfeld_2012,K_nig_2013,van_de_Meent_2014}.
For freely-diffusing smFRET experiments, although it is common to find mention of
``code available from the authors upon reques'' in publications, there is a dearth
of such open source code, with, to our knowledge, the notable exception of a single
example~\cite{Murphy2014}.
To address this issue, we have developed FRETBursts,
an open source Python software for analysis of freely-diffusing single-molecule FRET measurements.
FRETBursts can be used, inspected and modified by anyone interested in using
state-of-the art smFRET analysis methods or implementing modifications or completely new techniques.
FRETBursts therefore represents an ideal platform
for quantitative comparison of different methods for smFRET burst analysis.
Technically, a strong emphasis has been given to the reproducibility of complete analysis
workflows. FRETBursts uses Jupyter Notebooks~\cite{Shen_2014},
an interactive and executable document containing textual narrative, input parameters,
code, and computational results (tables, plots, etc.). A notebook thus captures the various analysis steps
in a document which is easy to share and execute.
To minimize the possibility of bugs being introduced inadvertently~\cite{Soergel_2015},
we employ modern software engineering techniques
such as unit testing and continuous integration~\cite{Wilson_2014,Eglen_2016}.
FRETBursts is hosted on GitHub~\cite{Blischak_2016,Prli__2012},
where users can write comments, report issues or contribute code.
In a related effort, we recently introduced Photon-HDF5~\cite{Ingargiola2016},
an open file format for timestamp-based single-molecule fluorescence
experiments. An other related open source tool is PyBroMo~\cite{Ingargiola_2016},
a freely-diffusing smFRET simulator which produces Photon-HDF5 files that are
directly analyzable with FRETBursts.
Together with all the aforementioned tools, FRETBursts contributes to the growing
ecosystem of open tools for reproducible science in the single-molecule field.
\subsection*{Paper Overview}
This paper is written as an introduction to smFRET burst analysis and
its implementation in FRETBursts.
The aim is illustrating the specificities and
trade-offs involved in various approaches
with sufficient details to enable readers
to customize the analysis for their own needs.
After a brief overview of FRETBursts features (section~\nameref{sec:overview}),
we introduce essential concepts and terminology for smFRET burst analysis
(section~\nameref{sec:concepts}).
In section~\nameref{sec:analysis}, we illustrate the steps involved
in smFRET burst analysis: (i) data loading (section~\nameref{sec:dataload}),
(ii) definition of the excitation alternation periods
(section~\nameref{sec:alternation}), (iii) background correction
(section~\nameref{sec:bg_calc}), (iv) burst search
(section~\nameref{sec:burstsearch}),
(v) burst selection (section~\nameref{sec:burstsel}) and
(vi) FRET histogram fitting (section~\nameref{sec:fretfit}).
As an example
of implementation of an advanced data processing technique,
section~\nameref{sec:bva} walks the reader thorough implementing
Burst Variance Analysis (BVA)~\cite{Torella_2011}.
Finally, section~\nameref{sec:conclusions} summarizes what we believe
to be the strengths of FRETBursts software.
Throughout this paper,
links to relevant sections of documentation and other web resources
are displayed as ``(link)''.
In order to make the text more legible,
we have concentrated Python-specific details in paragraphs titled
\textit{Python details}. These subsections provide deeper insights for readers
already familiar with Python and can be initially skipped by readers who are not.
Finally, note that all commands and figures in this paper can be regenerated
using the accompanying notebooks
(\href{https://github.com/tritemio/fretbursts_paper}{link}).
\section*{FRETBursts Overview}
\label{sec:overview}
\subsection*{Technical Features}
FRETBursts can analyze smFRET measurements
from one or multiple excitation spots~\cite{Ingargiola_2013}. The supported
excitation schemes include single laser, alternating laser excitation (ALEX)
with either CW lasers (μs-ALEX~\cite{Kapanidis_2005})
or pulsed lasers (ns-ALEX~\cite{Laurence_2005} or
pulsed-interleaved excitation (PIE)~\cite{M_ller_2005}).
The software implements both standard and novel algorithms for smFRET data analysis
including background estimation as a function of time (including background accuracy
metrics), sliding-window burst search~\cite{Eggeling_1998},
dual-channel burst search (DCBS)~\cite{Nir_2006} and
modular burst selection methods based on user-defined criteria
(including a large set of pre-defined selection rules). Novel features include burst size
selection with $\gamma$-corrected burst sizes, burst weighting, burst search with
background-dependent threshold (in order to guarantee a minimal signal-to-background
ratio~\cite{Michalet_2012}).
Moreover, FRETBursts provides a large set of fitting options to characterize FRET subpopulations.
In particular, distributions of burst quantities (such as $E$ or $S$) can be assessed
through (1) histogram fitting (with arbitrary model functions),
(2) non-parametric weighted kernel density estimation (KDE), (3) weighted
expectation-maximization (EM), (4) maximum likelihood fitting using Gaussian models
or Poisson statistic. Finally FRETBursts includes a large number of
predefined and customizable plot functions which (thanks to the \textit{matplotlib}
graphic library) produce publication quality plots in a wide range of formats.
Additionally, implementations of population dynamics analysis such
as Burst Variance Analysis (BVA)~\cite{Torella_2011} and two-channel
kernel density distribution estimator (2CDE)~\cite{Tomov_2012}
are available as FRETBursts notebooks
\DIFdelbegin \DIFdel{.
}\DIFdelend \DIFaddbegin \DIFadd{(}\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/Example%20-%20Burst%20Variance%20Analysis.ipynb}{BVA link},
\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/Example%20-%202CDE%20Method.ipynb}{2CDE link}\DIFadd{).
}\DIFaddend
\subsection*{Software Availability}
FRETBursts is hosted and openly developed on GitHub. FRETBursts homepage
(\href{http://tritemio.github.io/FRETBursts}{link})
contains links to the various resources. \DIFaddbegin \DIFadd{Pre-built packages are provided for
Windows, OS X and Linux. }\DIFaddend Installation instructions
can be found in the Reference Documentation
(\href{http://fretbursts.readthedocs.org/en/latest/getting_started.html}{link}).
A description of FRETBursts execution using Jupyter notebooks is reported
in~\nameref{sec:notebook}. % SI_link
Detailed information on development style, testing strategies and
contributions guidelines are reported in~\nameref{sec:dev}. % SI_link
Finally, to facilitate evaluation and comparison with other software,
we set up an on-line services allowing to execute FRETBursts
without requiring any installation on the user's computer (\href{https://github.com/tritemio/FRETBursts_notebooks#run-online}{link}).
\section*{Architecture and Concepts}
\label{sec:concepts}
In this section, we introduce some general burst analysis concepts
and notations used in FRETBursts.
\subsection*{Photon Streams}
\label{sec:ph_streams}
The raw data collected during a smFRET experiment consists in one or more arrays of
photon timestamps, whose temporal resolution is set by the acquisition hardware,
typically between 10 and 50 ns.
In single-spot measurements, all timestamps are stored in a single array. In multispot
measurements~\cite{Ingargiola_2013}, there are as many timestamps arrays
as excitation spots.
Each array contains timestamps from both donor (D) and acceptor (A) channels.
When alternating excitation lasers are used (ALEX measurements)~\cite{Lee_2005},
a further distinction between photons emitted during the D or A excitation periods can be made.
In FRETBursts, the corresponding sets of photons are called ``photon streams''
and are specified with a \verb|Ph_sel| object
(\href{http://fretbursts.readthedocs.org/en/latest/ph_sel.html}{link}).
In non-ALEX smFRET data, there are 3 photon streams
(table~\ref{tab:ph_sel_smfret}), while in \DIFaddbegin \DIFadd{two-color }\DIFaddend ALEX data,
there are 5 streams (table~\ref{tab:ph_sel_alex}).
The \verb|Ph_sel| class (\href{http://fretbursts.readthedocs.org/en/latest/ph_sel.html}{link})
allows the specification of any combination of photon streams.
For example, in ALEX measurements, the D-emission during A-excitation stream is
usually ignored because it does not contain any useful signal~\cite{Lee_2005}.
To indicate all but photons in this photon stream, the syntax is
\verb|Ph_sel(Dex='DAem', Aex='Aem')|, which indicates selection of donor
and acceptor photons (\verb|DAem|) during donor excitation (\verb|Dex|) and only acceptor
photons (\verb|Aem|) during acceptor excitation (\verb|Aex|).
\begin{table}
\begin{tabular}{l|l}
Photon selection & code \\
\hline
All-photons & \verb|Ph_sel('all')|\\
D-emission & \verb|Ph_sel(Dex='Dem')|\\
A-emission & \verb|Ph_sel(Dex='Aem')|\\
\end{tabular}
\caption{\label{tab:ph_sel_smfret}Photon selection syntax (non-ALEX)}
\end{table}
\begin{table}
\begin{tabular}{l|l}
Photon selection & code \\
\hline
All-photons & \verb|Ph_sel('all')|\\
D-emission during D-excitation & \verb|Ph_sel(Dex='Dem')|\\
A-emission during D-excitation & \verb|Ph_sel(Dex='Aem')|\\
D-emission during A-excitation & \verb|Ph_sel(Aex='Dem')|\\
A-emission during A-excitation & \verb|Ph_sel(Aex='Aem')|\\
\end{tabular}
\caption{\label{tab:ph_sel_alex}Photon selection syntax (ALEX)}
\end{table}
\subsection*{Background Definitions}
\label{sec:bg_intro}
An estimation of the background rates is needed to both select a proper threshold for
burst search, and to correct the raw burst counts by \DIFdelbegin \DIFdel{subtraction of }\DIFdelend \DIFaddbegin \DIFadd{subtracting }\DIFaddend background counts.
The recorded stream of timestamps is the result of two processes: one characterized
by a high count rate, due to fluorescence photons of single molecules crossing the
excitation volume, and another characterized by a lower count rate, due to ``background
counts'' originating from detector dark counts, afterpulsing, out-of-focus molecules
and sample scattering and/or impurities~\cite{Edman_1996,Gopich_2008}.
The signature of these two types of processes can be
observed in the inter-photon delays distribution (i.e. the waiting times
between two subsequent timestamps) as illustrated in figure~\ref{fig:bg_dist_all}(a).
The ``tail'' of the distribution (a straight line in semi-log scale) corresponds
to exponentially-distributed time-delays, indicating that those counts are generated by a
Poisson process. At short
timescales, the distribution departs from the exponential due to the contribution
of the higher rate process of single molecules traversing the excitation volume.
To estimate the background rate (i.e. the inverse of the exponential time constant),
it is necessary to define a time-delay threshold above which the distribution
can be considered exponential.
Finally, a parameter estimation method needs to be specified, such as Maximum
Likelihood Estimation (MLE) or non-linear least squares curve fitting of
the time-delay histogram (both supported in FRETBursts).
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.77\columnwidth]{figures/ph_delays_distrib_all/ph_delays_distrib_all}
\caption{\label{fig:bg_dist_all} \textbf{Inter-photon delays fitted with and exponential function.}
Experimental distributions of inter-photon delays (\textit{dots}) and
corresponding fits of the exponential tail (\textit{solid lines}).
(\textit{Panel a}) An example of inter-photon delays distribution (\textit{red dots}) and an exponential fit
of the tail of the distribution (\textit{black line}).
(\textit{Panel b}) Inter-photon delays distribution and exponential fit for different photon streams as obtained with \texttt{dplot(d, hist\_bg)}. The \textit{dots} represent the experimental histogram for the different photon streams. The \textit{solid lines} represent the corresponding exponential fit of the tail of the distributions. The legend shows abbreviations of the photon streams
and the fitted background rates.%
}
\end{center}
\end{figure}
It is advisable to monitor the background as a function of time
throughout the measurement, in order to account for possible variations.
Experimentally, we found that when the background is not constant,
it usually varies
on time scales of tens of seconds (see figure~\ref{fig:bg_timetrace}).
FRETBursts divides the acquisition in constant-duration time
windows called \textit{background periods} and computes the background rates for
each of these windows (see section~\nameref{sec:bg_calc}).
Note that FRETBursts uses these local background rates also during burst search,
in order to compute time-dependent burst detection thresholds
and for background correction of burst data (see section~\nameref{sec:burstsearch}).
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.91\columnwidth]{figures/background_timetrace/background_timetrace}
\caption{\label{fig:bg_timetrace} \textbf{Background rates as a function of time.}
Estimated background rate as a function of time for two μs-ALEX measurements.
Different colors represent different photon streams.
(\textit{Panel a}) A measurement performed with a sealed sample chamber
exhibiting constant a background as a function of time.
(\textit{Panel b}) A measurement performed on an unsealed sample exhibiting
significant background variations due to sample evaporation and/or
photobleaching (likely impurities on the cover-glass).
These plots are produced by the command
\texttt{dplot(d, timetrace\_bg)} after estimation of background.
Each data point in these figures is computed for a 30~s time window.%
}
\end{center}
\end{figure}
\subsection*{The \texttt{Data} Class}
\label{sec:data_intro}
The \verb|Data| class
(\href{http://fretbursts.readthedocs.org/en/latest/data_class.html}{link})
is the fundamental data container in FRETBursts. It contains the
measurement data and parameters (attributes) as well as several methods
for data analysis (background estimation, burst search, etc...).
All analysis results (bursts data, estimated parameters) are also stored
as \verb|Data| attributes.
There are 3 important ``burst counts'' attributes which contain
the number of photons detected in the donor or the acceptor channel
during donor or acceptor excitation (table~\ref{tab:data_n}).
The attributes in table~\ref{tab:data_n} are background-corrected by default.
Furthermore, \verb|na| is corrected for leakage and direct excitation
(section~\nameref{sec:corrcoeff}) if the relative coefficients are specified
(by default they are 0).
There is also a closely related attribute named \verb|nda| for donor photons
during acceptor excitation. \verb|nda| is normally neglected as it only contains
background.
\begin{table}
\begin{tabular}{l p{0.8\columnwidth}}
Name & Description \\
\hline
\verb|nd| & number of photons detected by the donor channel (during donor excitation period in ALEX case)\\
\verb|na| & number of photons detected by the acceptor channel (during donor excitation period in ALEX case)\\
\verb|naa| & number of photons detected by the acceptor channel during acceptor excitation period (present only in ALEX measurements)\\
\end{tabular}
\caption{\label{tab:data_n}\texttt{Data} attributes names and descriptions for burst photon counts in different photon streams.}
\end{table}
\paragraph*{Python details}
Many \verb|Data| attributes are lists of arrays (or scalars) with the length of the lists
equal to the number of excitation spots. This means that in
single-spot measurements, an array of burst-data
is accessed by specifying the index as 0, for example \verb|Data.nd[0]|.
\verb|Data| implements a shortcut syntax to access the first element of a list
with an underscore, so that an equivalently syntax is
\verb|Data.nd_| instead of \verb|Data.nd[0]|.
\subsection*{Introduction to Burst Search}
\label{sec:burstsearch_intro}
Identifying single-molecule fluorescence bursts in the stream of photons is
one of the most crucial steps in the analysis of freely-diffusing single-molecule FRET data.
The widely used ``sliding window'' algorithm, introduced by the Seidel group in
1998~\cite{Eggeling_1998,Fries_1998}, involves searching for
$m$ consecutive photons detected during a period shorter than
$\Delta t$. In other words, bursts are regions of the photon stream where the
local rate (computed using $m$ photons) is above a minimum threshold rate.
Since a universal criterion to choose the rate threshold and
the number of photons $m$ is, as of today, lacking, it has become a common
practice to manually adjust those parameters for each specific measurement.
\DIFaddbegin \DIFadd{Commonly employed values for $m$ are between 5 and 15 photons.
}\DIFaddend
A more general approach consists in taking into account the background rate of
the specific measurements and in choosing a rate threshold that is $F$ times
larger than the background rate \DIFaddbegin \DIFadd{(typical values for $F$ are between 4 and 9)}\DIFaddend .
This approach ensures that all resulting bursts
have a signal-to-background ratio (SBR) larger than
$(F-1)$~\cite{Michalet_2012}. A consistent criterion for choosing the threshold is
particularly important when comparing different measurements with different background
rates, when the background significantly varies during measurements or in
multi-spot measurements where each spot has a different background rate.
A second important aspect of burst search is the choice of photon stream used
to perform the search.
In most cases, for instance when identifying FRET sub-populations,
the burst search should use all \DIFdelbegin \DIFdel{photons (i.e. APBS). In some }\DIFdelend \DIFaddbegin \DIFadd{the photons, the so called
all-photon burst search (APBS)~\mbox{%DIFAUXCMD
\cite{Eggeling_1998,Fries_1998,Nir_2006}}%DIFAUXCMD
.
In }\DIFaddend other cases, \DIFaddbegin \DIFadd{for example }\DIFaddend when focusing on
donor-only or \DIFdelbegin \DIFdel{acceptor only }\DIFdelend \DIFaddbegin \DIFadd{acceptor-only }\DIFaddend populations, it is better to perform
the search using only donor or acceptor signal.
In order to handle the general case and to provide flexibility,
FRETBursts allows performing the burst search on arbitrary selections of photons.
(see section~\nameref{sec:ph_streams} for more information on photon stream definitions).
Additionally, Nir~\textit{et al.}~\cite{Nir_2006} proposed \DIFdelbegin \DIFdel{DCBS (``}\DIFdelend \DIFaddbegin \DIFadd{a }\DIFaddend dual-channel
burst search \DIFdelbegin \DIFdel{'')
, }\DIFdelend \DIFaddbegin \DIFadd{(DCBS)
}\DIFaddend which can help mitigating artifacts due to photophysics effects such as blinking.
During DCBS, a search is performed \DIFdelbegin \DIFdel{in parallel }\DIFdelend on two photon streams
and bursts are defined as periods during which both photon streams
exhibit a rate higher than
the threshold, implementing the equivalent of an AND logic operation.
Conventionally, the term DCBS refers to a burst search where the two photon streams
are (1) all photons during donor excitation (\verb|Ph_sel(Dex='DAem')|) and
(2) acceptor channel photons during acceptor excitation (\verb|Ph_sel(Aex='Aem')|).
In FRETBursts, the user can choose arbitrary photon streams as input, an in general
this kind of search is called a ``AND-gate burst search''.
After burst search, it is necessary to select
bursts, for instance by specifying a minimum number of photons (or burst size). In the most
basic form, this selection can be performed during burst search by discarding
bursts with size smaller than a threshold $L$ \DIFaddbegin \DIFadd{(typically 30 or higher)}\DIFaddend ,
as originally proposed by
Eggeling~\textit{et al.}~\cite{Eggeling_1998}.
This method, however, neglects the effect
of background and $\gamma$ factor on the burst size and can lead to a selection
bias for some channels and/or sub-populations.
For this reason, we suggest performing a burst size selection after background
correction, taking into account the $\gamma$ factor, as discussed in
sections~\nameref{sec:burstsizeweights} and~\nameref{sec:burstsel}.
In special cases, users may choose to replace (or combine)
the burst selection based on burst size
with another criterion such as burst duration or brightness (see section~\nameref{sec:burstsel}).
\subsection*{Corrected Burst Sizes and Weights}
\label{sec:burstsizeweights}
The number of photons detected during a burst --the ``burst size''--
is computed using either all photons, or photons detected
during donor excitation period. To compute the burst size, FRETBursts uses
one of the following formulas:
\begin{equation}
\label{eq:burstsize_dex}
n_{dex} = n_a + \gamma\,n_d
\end{equation}
\begin{equation}
\label{eq:burstsize_allph}
n_t = n_a + \gamma\,n_d + n_{aa}
\end{equation}
\noindent where $n_d$, $n_a$ and $n_{aa}$ are, similarly to the attributes
in table~\ref{tab:data_n}, the background-corrected
burst counts in different channels and excitation periods.
The factor $\gamma$ takes into account
different fluorescence quantum yields of donor and acceptor fluorophores and different
photon detection efficiencies between donor and acceptor detection
channels~\cite{Deniz_1999,Lee_2005}.
Eq.~\ref{eq:burstsize_dex} includes counts collected during donor excitation periods only,
while eq.~\ref{eq:burstsize_allph} includes all counts.
Burst sizes computed according to eq.~\ref{eq:burstsize_dex}
or~\ref{eq:burstsize_allph} are called $\gamma$-corrected burst sizes.
The burst search algorithm yields a set of bursts whose sizes
approximately follow an exponential distribution.
Compared to bursts with smaller sizes, bursts with large sizes are less frequent,
but contain more information per-burst (having higher SNR).
Therefore, selecting bursts by size is an important step (see \DIFdelbegin \DIFdel{section~}\DIFdelend \nameref{sec:burstsel}).
A threshold set too low may result in unresolvable sub-populations
because of broadening of FRET peaks and appearance of shot-noise artifacts
in the FRET (and \DIFdelbegin \DIFdel{S}\DIFdelend \DIFaddbegin \DIFadd{$S$}\DIFaddend ) distribution (i.e. spurious narrow peaks due to \DIFdelbegin \DIFdel{E and S }\DIFdelend \DIFaddbegin \DIFadd{$E$ and $S$ }\DIFaddend being
computed as the ratio of small integers).
Conversely, too large a threshold may result in too low a number of bursts
therefore poor representation of the FRET distribution.
Additionally, especially when computing fractions of sub-populations
(e.g. ratio of number of bursts in each sub-population),
it is important to use $\gamma$-corrected burst sizes as selection criterion,
in order to avoid under-representing some FRET sub-populations
due to different quantum yields of donor and acceptor dyes and/or
different photon detection efficiencies of donor and acceptor channels.
\DIFaddbegin \DIFadd{An alternative method to apply the $\gamma$ correction is to randomly
discard a constant fraction of photons chosen randomly from either
the Dem or Aem photon stream~\mbox{%DIFAUXCMD
\cite{Nir_2006}}%DIFAUXCMD
. This
simple method transforms the measurement data in order to
achieve $\gamma=1$, overcoming the issue of selection bias between populations.
This approach has also the advantage of preserving
the binomial distribution of D and A photons in each burst, so that peaks
of FRET populations are easier to model statistically.
The only drawback is that, by discarding a fraction of photons,
this method leads to information loss and therefore to a potential
decrease in sensitivity and/or accuracy.
}
\DIFaddend A simple way to mitigate the dependence of the FRET distribution on
the burst size selection threshold is weighting bursts proportionally to their size
so that the bursts with largest sizes will have the largest weights.
Using size as weights (instead of any other monotonically increasing function
of size) can be justified noticing that the variance of bursts proximity ratio (PR) is
inversely proportional to the burst size (see~\nameref{sec:burstweights_theory} for details). % SI_link
In general, a weighting scheme is used for building efficient estimators for a population
parameter (e.g. the population FRET efficiency $E_p$).
But, it can also be used to build weighted histograms or Kernel Density
Estimation (KDE) plots which emphasize FRET subpopulations peaks
without excluding small size bursts.
Traditionally, for optimal results when not using weights, the
FRET histogram is manually adjusted by finding an ad-hoc (high)
size-threshold which selects only bursts with the highest size (and thus lowest variance).
Building size-weighted FRET histograms is a simple method to balance
the need of reducing the peaks width with the need of including as much bursts
as possible to reduce statistical noise.
As a practical example, by fixing the burst size threshold to a low value (e.g. 10-20 photons)
and using weights, is possible to build a FRET histogram with well-defined FRET sub-populations peaks
without the need of searching an optimal burst-size threshold (\nameref{sec:burstweights_theory}).
\paragraph*{Python details}
FRETBursts has the option to weight bursts using $\gamma$-corrected
burst sizes which optionally include acceptor excitation photons \verb|naa|.
A weight proportional to the burst size is applied by passing the argument
\verb|weights='size'| to histogram or KDE plot functions. The \verb|weights|
keyword can be also passed to fitting functions in order to fit
the weighted E or S distributions (see section~\nameref{sec:fretfit}).
Other weighting functions (for example depending quadratically on the size)
are listed in the \verb|fret_fit.get_weights| documentation
(\href{http://fretbursts.readthedocs.org/en/latest/fret_fit.html#fretbursts.fret_fit.get_weights}{link}).
However, using weights different from the size is not recommended
due to their less efficient use of burst information
\DIFaddbegin \DIFadd{(}\nameref{sec:burstweights_theory}\DIFadd{)}\DIFaddend .
\section*{smFRET Burst Analysis}
\label{sec:analysis}
\subsection*{Loading the Data}
\label{sec:dataload}
While FRETBursts can load several data files formats,
we encourage users to adopt the recently introduced Photon-HDF5
file format~\cite{Ingargiola2016}.
Photon-HDF5 is an HDF5-based, open format, specifically designed
for freely-diffusing smFRET and
other timestamp-based experiments.
Photon-HDF5 is a self-documented, platform- and language-independent binary format,
which supports compression and allows saving photon data (e.g. timestamps)
and measurement-specific metadata
(e.g. setup and sample information, authors, provenance, etc.).
Moreover, Photon-HDF5 is designed for long-term data preservation and aims
to facilitate data sharing
between different software and research groups.
All example data files provided with FRETBursts use the Photon-HDF5 format.
To load data from a Photon-HDF5 file, we use the function \verb|loader.photon_hdf5|
(\href{http://fretbursts.readthedocs.org/en/latest/loader.html#fretbursts.loader.photon_hdf5}{link}):
\begin{lstlisting}
d = loader.photon_hdf5(filename)
\end{lstlisting}
\noindent
where \verb|filename| is a string containing the file path.
This command loads the measurement data into the variable \verb|d|,
a \verb|Data| object (see section~\nameref{sec:data_intro}).
The same command can load data from a variety of smFRET measurements supported
by the Photon-HDF5 format, taking advantage of the rich metadata included with each file.
For instance, data generated using different excitation schemes such as CW excitation
or pulsed excitation, single-laser vs two alternating lasers, etc.,
or with any number of excitation spots, are automatically recognized and interpreted accordingly.
FRETBursts also supports loading μs-ALEX data stored in .sm files
(a custom binary format used in the Weiss lab) and
ns-ALEX data stored in .spc files (a binary format used by TCSPC Becker \& Hickl acquisition hardware).
Alternatively, these and other formats (such as ht3, a binary format used by PicoQuant hardware)
can be converted into Photon-HDF5 files using phconvert,
a file conversion library and utility for Photon-HDF5
(\href{http://photon-hdf5.github.io/phconvert/}{link}).
More information on loading different file formats
can be found in the \verb|loader| module's documentation
(\href{http://fretbursts.readthedocs.org/en/latest/loader.html}{link}).
\subsection*{Alternation Parameters}
\label{sec:alternation}
For μs-ALEX and ns-ALEX data, Photon-HDF5 normally stores parameters defining
alternation periods corresponding to donor and acceptor laser excitation.
At load time, a user can plot these parameters and change them if deemed necessary.
In μs-ALEX measurements~\cite{Kapanidis_2004},
CW laser lines are alternated on timescales of the order of 10 to 100~μs.
Plotting an histogram of timestamps modulo the alternation period, it
is possible to identify the donor and acceptor excitation periods (see figure~\ref{fig:altern_hist_double}a).
In ns-ALEX measurements~\cite{Laurence_2005},
pulsed lasers with equal repetition rates are delayed with respect
to one another with typical delays of 10 to 100~ns.
In this case, forming an histogram of TCSPC times (nanotimes) will allow
the definition of periods of fluorescence after excitation
of either the donor or the acceptor (see figure~\ref{fig:altern_hist_double}b).
In both cases, the function
\verb|plot_alternation_hist|
(\href{http://fretbursts.readthedocs.org/en/latest/plots.html#fretbursts.burst_plot.plot_alternation_hist}{link})
will plot the relevant alternation histogram (figure~\ref{fig:altern_hist_double})
using currently selected (or default) values for donor and acceptor excitation periods.
\begin{figure}[h!]
\begin{center}
\includegraphics[width=1\columnwidth]{figures/ALEX_alternation_double/ALEX_alternation_double}
\caption{\label{fig:altern_hist_double}
\textbf{Alternation histograms for μs-ALEX and ns-ALEX measurements.}
Histograms used for the selection/determination
of the alternation periods for two typical smFRET-ALEX experiments.
Distributions of photons detected by donor channel are in \textit{green},
and by acceptor channel in \textit{red}.
The light \textit{green} and \textit{red} shaded areas indicate the donor
and acceptor period definitions.
(a) μs-ALEX alternation histogram, i.e. histogram of timestamps \textit{modulo}
the alternation period for a smFRET measurement.
(b) ns-ALEX TCSPC nanotime histogram for a smFRET measurement.
Both plots have been generated by the same plot function
(\texttt{plot\_alternation\_hist()}).
Additional information on these specific measurements can be found in the
attached notebook
(\href{http://nbviewer.jupyter.org/github/tritemio/fretbursts_paper/blob/master/notebooks/Figures\%20-\%20ALEX\%20histograms.ipynb}{link}).%
}
\end{center}
\end{figure}
To change the period definitions, we can type:
\begin{lstlisting}
d.add(D_ON=(2100, 3900), A_ON=(100, 1900))
\end{lstlisting}
\DIFaddbegin \noindent \DIFaddend where \verb|D_ON| and \verb|A_ON| are tuples (pairs of numbers) representing
the \textit{start} and \textit{stop} values for D or A excitation periods.
The previous command works for both μs-ALEX and ns-ALEX measurements.
After changing the parameters, a new alternation plot will show the updated
period definitions.
The alternation period definition can be applied to the data
using the function \verb|loader.alex_apply_period|
(\href{http://fretbursts.readthedocs.org/en/latest/loader.html#fretbursts.loader.alex_apply_period}{link}):
\begin{lstlisting}
loader.alex_apply_period(d)
\end{lstlisting}
After this command, \verb|d| will contain only photons inside the defined excitation periods.
If the user needs to update the periods definition, the data file will need to be
reloaded and the steps above repeated as described.
\subsection*{Background Estimation}
\label{sec:bg_calc}
The first step of smFRET analysis involves estimating background rates.
For example, \DIFdelbegin \DIFdel{to compute the background }\DIFdelend \DIFaddbegin \DIFadd{the following command:
}
%DIF > Don't split command on two lines for PLOS
\begin{lstlisting}
d.calc_bg(bg.exp_fit, time_s=30, tail_min_us='auto')
\end{lstlisting}
\noindent \DIFadd{estimates the background rates in windows of 30~s
using the default iterative algorithm for choosing the
fitting threshold (}\nameref{sec:bg_intro}\DIFadd{). %DIF > PLOS: remove section and use nameref
Beginner users can simply use the previous command and
proceed to burst search (}\nameref{sec:burstsearch}\DIFadd{). %DIF > PLOS: remove section and use nameref
For more advanced users, this section provides details on
the different background estimation and plotting functions
provided by FRETBursts.
}
\DIFadd{As a start, we show how to estimate the background }\DIFaddend every 30~s,
using a \DIFdelbegin \DIFdel{minimal }\DIFdelend \DIFaddbegin \DIFadd{fixed }\DIFaddend inter-photon delay \DIFdelbegin \DIFdel{fixed }\DIFdelend threshold of 2~ms
\DIFdelbegin \DIFdel{for the all photon streams, the corresponding command is}\DIFdelend \DIFaddbegin \DIFadd{(the same for all the photon streams)}\DIFaddend :
\begin{lstlisting}
d.calc_bg(bg.exp_fit, time_s=30, tail_min_us=2000)
\end{lstlisting}
The first argument (\verb|bg.exp_fit|) is the function used to fit the
background rate for each photon stream (see section~\nameref{sec:bg_intro}).
The function
\verb|bg.exp_fit| estimates the background using a maximum likelihood estimation
(MLE) of the delays distribution.
The second argument, \verb|time_s|, is the duration of the
\textit{background period} (section~\nameref{sec:bg_intro}) and the third, \verb|tail_min_us|,
is the minimum inter-photon delay to use when fitting the distribution to the specified model function.
To use different thresholds for each photon stream we pass a
tuple (i.e. a comma-separated list of values, \href{https://docs.python.org/3.5/tutorial/datastructures.html#tuples-and-sequences}{link}) instead of a scalar.
The recommended approach is however automating the choice of threshold using
\verb|tail_min_us='auto'| using an heuristic algorithm which is described in
\textit{Background estimation} section of the μs-ALEX tutorial
(\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/FRETBursts%20-%20us-ALEX%20smFRET%20burst%20analysis.ipynb#Background-estimation}{link}).
Finally, it is possible to use a slower but rigorous approach for finding the optimal
threshold as described in~\nameref{sec:bg_opt_th}. % SI_link
FRETBursts provides two kinds of plots to represent the background. One shows the histograms
of inter-photon delays compared to the fitted exponential distribution, shown in
figure~\ref{fig:bg_dist_all}) (see section~\nameref{sec:bg_intro} for details on the inter-photon distribution).
This plot is created with the command:
\begin{lstlisting}
dplot(d, hist_bg, period=0)
\end{lstlisting}
This command reflects the general form of plotting commands in FRETBursts
as described in~\nameref{sec:plotting}. % SI_link
Here we only note that the argument \verb|period| is an integer specifying the background
period to be plotted (when omitted, the default is 0, i.e. the first period).
Figure~\ref{fig:bg_dist_all} allows to quickly identify pathological cases where the
background fitting procedure returns unreasonable values.
The second background-related plot represents a timetrace of background rates,
as shown in figure~\ref{fig:bg_timetrace}. This plot allows monitoring background rate variations
occurring during the measurement and is obtained with the command:
\begin{lstlisting}
dplot(d, timetrace_bg)
\end{lstlisting}
Normally, samples should have a fairly constant background rate as a function of time
as in figure~\ref{fig:bg_timetrace}(a). However, sometimes, non-ideal
experimental conditions can yield a time-varying background rate, as illustrated in
figure~\ref{fig:bg_timetrace}(b).
A possible reason for the observed behavior could be buffer evaporation from an open sample
\DIFdelbegin \DIFdel{or poorly }\DIFdelend \DIFaddbegin \DIFadd{(we strongly recommend using a }\DIFaddend sealed
observation chamber \DIFaddbegin \DIFadd{whenever possible)}\DIFaddend . Additionally,
cover-glass impurities can contribute to the background.
These impurities tend to bleach on timescales of minutes resulting in
background variations during the course of the measurement.
\paragraph*{Python details}
The estimated background rates are stored in the \verb|Data| attributes
\verb|bg_dd|, \verb|bg_ad| and \verb|bg_aa|, corresponding to photon
streams \verb|Ph_sel(Dex='Dem')|, \verb|Ph_sel(Dex='Aem')| and \verb|Ph_sel(Aex='Aem')|
respectively.
These attributes are lists of arrays (one array per excitation spot).
The arrays contain the estimated background rates in the different time windows
(background periods).
Additional background fitting functions (e.g. least-square fitting of inter-photon delay
histogram) are available in \verb|bg| namespace
(i.e. the \verb|background| module,
\href{http://fretbursts.readthedocs.org/en/latest/background.html}{link}).
\subsection*{Burst Search}
\label{sec:burstsearch}
%\subsubsection*{Burst Search in FRETBursts}
%\label{sec:burstsearch_code}
Following background estimation, burst search is the next step of
the analysis.
In FRETBursts, a standard burst search using a single photon stream
(see section~\nameref{sec:burstsearch_intro}) is performed by calling the
\verb|Data.burst_search| method
(\href{http://fretbursts.readthedocs.org/en/latest/data_class.html#fretbursts.burstlib.Data.burst_search}{link}).
For example, the following command:
\begin{lstlisting}
d.burst_search(F=6, m=10, ph_sel=Ph_sel('all'))
\end{lstlisting}
\DIFaddbegin \noindent \DIFaddend performs a burst search on all photons
(\verb|ph_sel=Ph_sel('all')|), with a count rate threshold equal to 6 times the
local background rate (\verb|F=6|), using 10 consecutive photons to compute the
local count rate (\verb|m=10|).
A different photon stream, threshold ($F$) or number of photons $m$ can be selected
by passing different values.
These parameters are good general-purpose starting point for smFRET analysis
but can they can be adjusted if needed.
Note that the previous burst search does not perform any burst size selection
(however, by definition, the minimum bursts size is effectively $m$).
An additional parameter $L$ can be passed to impose a minimum burst
size before any correction.
However, it is recommended to select bursts only after \DIFdelbegin \DIFdel{background corrections
are applied}\DIFdelend \DIFaddbegin \DIFadd{applying background
corrections}\DIFaddend , as discussed in the next section~\nameref{sec:burstsel}.
It might sometimes be useful to specify a fixed photon-rate threshold, instead
of a threshold depending on the background rate, as in the previous example. In
this case, instead of $F$, the argument \verb|min_rate_cps| can be used to
specify the threshold (in counts-per-second). For example, a burst search with
a 50~kcps threshold is performed as follows:
\begin{lstlisting}
d.burst_search(min_rate_cps=50e3, m=10,
ph_sel=Ph_sel('all'))
\end{lstlisting}
Finally, to perform a DCBS burst search (or in general an AND gate burst search,
see section~\nameref{sec:burstsearch_intro}) we use the function
\verb|burst_search_and_gate|
(\href{http://fretbursts.readthedocs.org/en/latest/plugins.html#fretbursts.burstlib_ext.burst_search_and_gate}{link}),
as illustrated in the following example:
\begin{lstlisting}
d_dcbs = bext.burst_search_and_gate(d, F=6, m=10)
\end{lstlisting}
The last command puts the burst search results in a new copy of the
\verb|Data| variable \verb|d|
(in this example \DIFdelbegin \DIFdel{, }\DIFdelend the copy is called \verb|d_dcbs|).
Since FRETBursts shares the timestamps and detectors arrays between
different copies of \verb|Data| objects, the memory usage is minimized, even when
several copies are created.
\paragraph*{Python details}
Note that, while \DIFdelbegin %DIFDELCMD < \verb|.burst_search()| %%%
\DIFdelend \DIFaddbegin \verb|d.burst_search()| \DIFaddend is a method of \verb|Data|,
\DIFdelbegin %DIFDELCMD < \verb|burst_search_and_gate| %%%
\DIFdelend \DIFaddbegin \verb|bext.burst_search_and_gate()| \DIFaddend is a function in the \verb|bext| module
taking a \verb|Data| object as a first argument and returning a new
\verb|Data| object.
The function \verb|burst_search_and_gate| accepts optional arguments,
\verb|ph_sel1| and \verb|ph_sel2|, whose default values correspond to the
classical DCBS photon stream selection (see section~\nameref{sec:burstsearch_intro}).
These arguments can be specified to select different photon streams than those used in
a classical DCBS.
The \verb|bext| module (\href{http://fretbursts.readthedocs.org/en/latest/plugins.html}{link})
collects ``plugin'' functions that provides additional algorithms
for processing \verb|Data| objects.
\subsection*{Bursts Corrections}
\label{sec:corrcoeff}
In μs-ALEX, there are 3 important correction parameters: $\gamma$-factor,
donor leakage into the acceptor channel
and acceptor direct excitation by the donor excitation laser~\cite{Lee_2005}.
These corrections can be applied to burst data by simply assigning values
to the respective \verb|Data| attributes:
\begin{lstlisting}
d.gamma = 0.85
d.leakage = 0.15
d.dir_ex = 0.08
\end{lstlisting}
These attributes can be assigned either before or after the burst search. In the
latter case, existing burst data is automatically updated using the new
correction parameters.
These correction factors can be used to display corrected FRET distributions.
However, when the goal is to fit the FRET efficiency of sub-populations,
it is simpler to fit the background-corrected
PR histogram and then correct the population-level PR value (see SI in~\cite{Lee_2005}).
Correcting PR of each population (instead of correcting the data in each burst)
avoids distortion of the FRET distribution and keeps peaks of
static FRET subpopulations closer to the ideal \DIFdelbegin \DIFdel{Binomial }\DIFdelend \DIFaddbegin \DIFadd{binomial }\DIFaddend statistics~\cite{Gopich_2007}.
FRETBursts implements the correction formulas for $E$ and $S$ in the functions
\verb|fretmath.correct_E_gamma_leak_dir| and \verb|fretmath.correct_S|
(\href{http://fretbursts.readthedocs.org/en/latest/fretmath.html}{link}).
A derivation of these correction formulas (using computer-assisted algebra)
can be found online as an interactive notebook (\href{http://nbviewer.jupyter.org/github/tritemio/notebooks/blob/master/Derivation%20of%20FRET%20and%20S%20correction%20formulas.ipynb}{link}).
\subsection*{Burst Selection}
\label{sec:burstsel}
After burst search, it is common to select bursts according to different
criteria. One of the most common is burst size.
For instance, to select bursts with more than 30 photons detected during the donor excitation
(computed after background correction), we use following command:
\begin{lstlisting}
ds = d.select_bursts(select_bursts.size, th1=30)
\end{lstlisting}
The previous command creates a new \verb|Data| variable (\verb|ds|) containing
the selected bursts. \verb|th1| defines the lower bound for burst size, while
\verb|th2| defines the upper bound (when not specified, as in the previous example,
the upper bound is $+\infty$).
As before, the new object (\verb|ds|) will share the photon data
arrays with the original object (\verb|d|) in order to minimize the amount
of used memory.
The first argument of \verb|select_bursts|
(\href{http://fretbursts.readthedocs.org/en/latest/data_class.html#burst-selection-methods}{link})
is a python function implementing the ``selection rule'' (\verb|select_bursts.size| in this example);
all remaining arguments (only \verb|th1| in this case) are parameters of the selection rule.
The \verb|select_bursts| module
(\href{http://fretbursts.readthedocs.org/en/latest/burst_selection.html}{link})
contains numerous built-in selection functions
(\href{http://fretbursts.readthedocs.org/en/latest/burst_selection.html#module-fretbursts.select_bursts}{link}).
For example,
\verb|select_bursts.ES|
is used to select a region on the E-S ALEX histogram,
\verb|select_bursts.width|
to select bursts based on their duration.
New custom criteria can be readily implemented by defining a new selection function,
which requires only a couple of lines of code (see the
\verb|select_bursts| module's source code for examples,
\href{https://github.com/tritemio/FRETBursts/blob/master/fretbursts/select_bursts.py}{link}).
Finally, different criteria can be combined sequentially.
For example, with the following commands:
\begin{lstlisting}
ds = d.select_bursts(select_bursts.size,
th1=50, th2=200)
dsw = ds.select_bursts(select_bursts.width,
th1=0.5e-3, th2=3e-3)
\end{lstlisting}
\DIFaddbegin \noindent \DIFaddend bursts in \verb|dsw|
will have sizes between 50 and 200 photons, and duration between 0.5 and 3~ms.
\paragraph*{Burst Size Selection}
In the previous section, we selected bursts by size, using only
photons detected in both D and A channels during D excitation (i.e. \DIFdelbegin \DIFdel{Dex }\DIFdelend \DIFaddbegin \DIFadd{D\textsubscript{ex} }\DIFaddend photons),
as in eq.~\ref{eq:burstsize_dex}.
Alternatively, a threshold on the burst size computed including all photons
can be applied by adding $n_{aa}$ to the burst size (see eq.~\ref{eq:burstsize_allph}).
This is achieved
by passing \verb|add_naa=True| to the selection function.
The complete selection command is:
\begin{lstlisting}
ds = d.select_bursts(select_bursts.size,
th1=30, add_naa=True)
\end{lstlisting}
\DIFdelbegin %DIFDELCMD < \noindent %%%
\DIFdelend The result of this selection is plotted in figure~\ref{fig:alex_jointplot}.
When \verb|add_naa| is not specified,
as in the previous section, the default is \verb|add_naa=False|
(i.e. compute size using only \DIFdelbegin \DIFdel{Dex }\DIFdelend \DIFaddbegin \DIFadd{D\textsubscript{ex} }\DIFaddend photons).
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.7\columnwidth]{figures/alex_jointplot/alex_jointplot}
\caption{\label{fig:alex_jointplot} \textbf{E-S histogram showing FRET, D-only and A-only populations.}
A 2-D ALEX histogram and marginal E and S histograms for a 40-bp dsDNA
with D-A distance of 17 bases (Donor dye: ATTO550, Acceptor dye: ATTO647N).
Bursts are selected with a size-threshold of 30 photons, including \DIFdelbeginFL \DIFdelFL{Aex }\DIFdelendFL \DIFaddbeginFL \DIFaddFL{A\textsubscript{ex} }\DIFaddendFL photons.
The plot is obtained with \texttt{alex\_jointplot(ds)}. The 2D E-S distribution plot (join plot)
is an histogram with hexagonal bins, which reduce the binning artifacts (compared to square bins)
and naturally resembles a scatter-plot when the burst density is low
\DIFaddbeginFL \DIFaddFL{(see }\nameref{sec:plotting}\DIFaddFL{)}\DIFaddendFL .
Three populations are visible: FRET population (middle), D-only population (top left) and
A-only population (bottom, $S < 0.2$). Compare with figure~\ref{fig:alex_jointplot_fretsel}
where the FRET population has been isolated.%
}
\end{center}
\end{figure}
Another important parameter for defining the burst size is the $\gamma$-factor, i.e.
the imbalance between the donor and the acceptor channel signals. As noted in
section~\nameref{sec:burstsizeweights}, the $\gamma$-factor is
used to compensate bias for the different fluorescence quantum yields of the D and A
fluorophores as well as the different photon-detection efficiencies of the D and A channels.
When $\gamma$ is significantly different from 1, neglecting its effect on burst size leads to
over-representing (in terms of number of bursts) one FRET population versus the others.
When the $\gamma$ factor is known \DIFaddbegin \DIFadd{(and $\ne 1$)}\DIFaddend , a more unbiased selection of different FRET
populations can be achieved passing the argument \verb|gamma| to the
selection function:
\begin{lstlisting}
ds = d.select_bursts(select_bursts.size,
th1=15, gamma=0.65)
\end{lstlisting}
When not specified, $\gamma=1$ is assumed.
\DIFdelbegin %DIFDELCMD <
%DIFDELCMD < %%%
\DIFdelend For more details on burst size selection, see the
\verb|select_bursts.size| documentation
(\href{http://fretbursts.readthedocs.org/en/latest/burst_selection.html#fretbursts.select_bursts.size}{link}).
\paragraph*{Python details}
\DIFdelbegin \DIFdel{To }\DIFdelend \DIFaddbegin \DIFadd{The method to }\DIFaddend compute $\gamma$-corrected burst sizes (with
or without addition of \verb|naa|)
\DIFdelbegin \DIFdel{the method }\DIFdelend \DIFaddbegin \DIFadd{is }\DIFaddend \verb|Data.burst_sizes|
(\href{http://fretbursts.readthedocs.org/en/latest/data_class.html#fretbursts.burstlib.Data.burst_sizes}{link})\DIFdelbegin \DIFdel{is used}\DIFdelend .
\paragraph*{Select the FRET Populations}
In smFRET-ALEX experiments, in addition to one or more FRET populations, there are always
donor-only (D-only) and acceptor-only (A-only) populations.
In most cases, these additional populations are not of interest and need to be filtered out.
In principle, using the E-S representation, D-only and A-only bursts
can be excluded by selecting bursts within a range of $S$ values (e.g. S=0.2-0.8).
This approach, however, simply truncates the burst distribution with arbitrary
thresholds and is therefore not recommended for quantitative assessment of FRET
populations.
An alternative approach consists in applying two selection filters sequentially.
First, the A-only population is filtered out
by applying a threshold on the number of photons during D excitation (\DIFdelbegin \DIFdel{Dex}\DIFdelend \DIFaddbegin \DIFadd{D\textsubscript{ex}}\DIFaddend ).
Second, the D-only population is filtered out by applying a threshold on
the number of A photons during A excitation (\DIFdelbegin \DIFdel{AemAex}\DIFdelend \DIFaddbegin \DIFadd{A\textsubscript{ex}A\textsubscript{em}}\DIFaddend ).
The commands for these combined selections are:
\begin{lstlisting}
ds1 = d.select_bursts(select_bursts.size, th1=15)
ds2 = ds1.select_bursts(select_bursts.naa, th1=15)
\end{lstlisting}
Here, \DIFaddbegin \DIFadd{the }\DIFaddend variable \verb|ds2| contains the combined burst selection.
Figure~\ref{fig:alex_jointplot_fretsel} shows the resulting pure FRET
population obtained with the previous selection.
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.7\columnwidth]{figures/alex_jointplot_fretsel/alex_jointplot_fretsel}
\caption{\label{fig:alex_jointplot_fretsel}
\textbf{E-S histogram after filtering out D-only and A-only populations.}
2-D ALEX histogram after selection of FRET population
using the composition of two burst selection filters:
(1) selection of bursts with counts in \DIFdelbeginFL \DIFdelFL{Dex }\DIFdelendFL \DIFaddbeginFL \DIFaddFL{D\textsubscript{ex} }\DIFaddendFL stream larger than 15;
(2) selection of bursts with counts in \DIFdelbeginFL \DIFdelFL{AemAex }\DIFdelendFL \DIFaddbeginFL \DIFaddFL{A\textsubscript{ex}A\textsubscript{em} }\DIFaddendFL stream larger than 15.
Compare to figure~\ref{fig:alex_jointplot} where all burst populations
(FRET, D-only and A-only) are reported.%
}
\end{center}
\end{figure}
\subsection*{Population Analysis}
\label{sec:fretfit}
Typically, after bursts selection, E or S histograms are fitted to a model.
FRETBursts \verb|mfit| module allows fitting histograms of bursts quantities
(i.e. E or S) with arbitrary models. In this context, a model is an object
specifying a function, the parameters varied during the fit
and optional constraints for these parameters. This concept of model
is taken from \textit{lmfit}~\cite{lmfit}, the underlying library used by
FRETBursts to perform the fits.
Models can be created from arbitrary functions.
\DIFdelbegin \DIFdel{By default,
FRETBursts allows using predefined }\DIFdelend \DIFaddbegin \DIFadd{FRETBursts includes predefined (i.e. built-in) }\DIFaddend models
such as 1 to 3 Gaussian peaks or 2-Gaussian connected by a \DIFdelbegin \DIFdel{``bridge''.
}\DIFdelend \DIFaddbegin \DIFadd{flat plateau.
The latter is an empirical model that
can be used to more accurately fit the center values of two populations
when the peaks are connected by intermediate-FRET bursts
(for the analytical definition of this function see the documentation,
}\href{http://fretbursts.readthedocs.io/en/latest/mfit.html#fretbursts.mfit.factory_two_gaussians}{link}\DIFadd{).
}\DIFaddend Built-in models are created by calling a corresponding factory function
(\DIFdelbegin \DIFdel{names starting }\DIFdelend \DIFaddbegin \DIFadd{whose names start }\DIFaddend with \verb|mfit.factory_|) which initializes the parameters
with values and constraints suitable for E and S histograms fits
\DIFdelbegin \DIFdel{.
}\DIFdelend (see \textit{Factory Functions} documentation,
\href{http://fretbursts.readthedocs.org/en/latest/mfit.html#model-factory-functions}{link}).
As an example, we \DIFaddbegin \DIFadd{can }\DIFaddend fit the E histogram of bursts in the
\verb|ds| variable with two Gaussian peaks with the following command:
\begin{lstlisting}
bext.bursts_fitter(ds, 'E', binwidth=0.03,
model=mfit.factory_two_gaussians())
\end{lstlisting}
Changing \verb|'E'| with \verb|'S'| will fit the S histogram instead.
The \verb|binwidth| argument specifies the histogram bin width and
the \verb|model| argument defines which model shall be used for
fitting.
All fitting results (including best fit values, uncertainties, etc...),
are stored in the \verb|E_fitter| (or \verb|S_fitter|)
attributes of the \verb|Data| variable (named \verb|ds| here).
To print a comprehensive summary of the fit results, including
uncertainties, reduced $\chi^2$ and correlation between parameters,
\DIFdelbegin \DIFdel{the we }\DIFdelend \DIFaddbegin \DIFadd{we can }\DIFaddend use the following command:
\begin{lstlisting}
fit_res = ds.E_fitter.fit_res[0]
print(fit_res.fit_report())
\end{lstlisting}
Finally, to plot the fitted model together with the FRET histogram,
as shown in figure~\ref{fig:histfit}, we pass the parameter \verb|show_model=True|
to the \verb|hist_fret| function
\DIFdelbegin \DIFdel{as follows
(seesection}\DIFdelend \DIFaddbegin \DIFadd{(see}\DIFaddend ~\nameref{sec:plotting} for an introduction to plotting in FRETBursts):
\begin{lstlisting}
dplot(ds, hist_fret, show_model=True)
\end{lstlisting}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.49\columnwidth]{figures/hist_fit/hist_fit}
\caption{\label{fig:histfit} \textbf{FRET histogram fitted with two Gaussians.}
Example of a FRET histogram fitted with a 2-Gaussian model.
After performing the fit (see main text), the plot is generated
with \texttt{dplot(ds, hist\_fret, show\_model=True)}.%
}
\end{center}
\end{figure}
For more examples on fitting bursts data and plotting results, refer to the
fitting section of the μs-ALEX notebook (\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/FRETBursts%20-%20us-ALEX%20smFRET%20burst%20analysis.ipynb#FRET-fit:-in-depth-example}{link}),
the \textit{Fitting Framework} section of the documentation
(\href{http://fretbursts.readthedocs.org/en/latest/fit.html}{link})
as well as the documentation for \verb|bursts_fitter| function
(\href{http://fretbursts.readthedocs.org/en/latest/plugins.html#fretbursts.burstlib_ext.bursts_fitter}{link}).
\paragraph*{Python details}
Models returned by FRETBursts's factory functions (\verb|mfit.factory_*|)
are \verb|lmfit.Model| objects (\href{https://lmfit.github.io/lmfit-py/model.html}{link}).
Custom models can be created by calling \verb|lmfit.Model| directly.
When an \verb|lmfit.Model| is fitted, it returns a \verb|ModelResults| object
(\href{https://lmfit.github.io/lmfit-py/model.html#the-modelresult-class}{link}),
which contains all information related to the fit (model, data,
parameters with best values and uncertainties) and useful methods to operate on fit results.
FRETBursts puts a \verb|ModelResults| object of each excitation spot in the list
\verb|ds.E_fitter.fit_res|.
For instance, to obtain the reduced $\chi^2$ value of the E histogram fit in a
single-spot measurement \verb|d|, we use the following command:
\begin{lstlisting}
d.E_fitter.fit_res[0].redchi
\end{lstlisting}
Other useful attributes are \verb|aic| and \verb|bic| which contain
\DIFaddbegin \DIFadd{statistics for }\DIFaddend the Akaike information criterion (AIC)\DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite{akaike_new_1974}
}%DIFAUXCMD
}\DIFaddend and the Bayes Information criterion (BIC)\DIFaddbegin \DIFadd{~\mbox{%DIFAUXCMD
\cite{schwarz_estimating_1978}}%DIFAUXCMD
}\DIFaddend .
AIC and BIC \DIFdelbegin \DIFdel{allow comparing different models and
selecting the most appropriate for the dataat hand.
}\DIFdelend \DIFaddbegin \DIFadd{are general-purpose statistical criteria for comparing the
suitability of multiple non-nested models according to the data.
By penalizing models with higher number of parameters, these criteria
strike a balance between the need of achieving high goodness of fit
with the need of keeping the model complexity low to avoid overfitting.
}\DIFaddend
Examples of definition and modification of fit models are provided in
the aforementioned μs-ALEX notebook
(\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/FRETBursts%20-%20us-ALEX%20smFRET%20burst%20analysis.ipynb#FRET-fit:-in-depth-example}{link}).
Users can also refer to the comprehensive lmfit's documentation
(\href{http://lmfit.github.io/lmfit-py/}{link}).
\DIFaddbegin \subsection*{\DIFadd{FRET Dynamics}}
\label{sec:dynamics}
\DIFaddend
\DIFaddbegin
\DIFadd{Single-molecule FRET histograms show more information than just mean FRET efficiencies.
While in general the presence of several peaks clearly indicates the existence of
multiple subpopulations, a single peak cannot a priori be associated with
a single population defined by a unique FRET efficiency without further analysis (such as, for instance, shot-noise analysis~\mbox{%DIFAUXCMD
\cite{Nir_2006,Antonik2006}}%DIFAUXCMD
).}\DIFaddend
\DIFaddbegin \DIFadd{Shot-noise analysis~\mbox{%DIFAUXCMD
\cite{Nir_2006} }%DIFAUXCMD
or probability
distribution analysis (PDA)~\mbox{%DIFAUXCMD
\cite{Antonik2006,kalinin_probability_2007}
}%DIFAUXCMD
allow to compute the minimum width of a static FRET population
(i.e. caused by the statistics of discrete photon-detection events).
Typically, several mechanisms
contribute to the broadening of the experimental FRET peak
beyond the shot-noise limit. These include heterogeneities in the sample
resulting in a distribution of Förster radiuses,
or actual conformational changes giving rise to a distribution
of D-A distances~\mbox{%DIFAUXCMD
\cite{sisamakis_accurate_2010}}%DIFAUXCMD
.
}
\DIFadd{Gopich and Szabo developed an elegant analytical model
for the FRET distribution of $M$ interconverting states
based on superposition of Gaussian peaks~\mbox{%DIFAUXCMD
\cite{gopich_fret_2010}}%DIFAUXCMD
.
Unfortunately, the method is not of straightforward application for
freely-diffusing data as it requires a special selection
criterion for filtering bursts with quasi-Poisson rates.
Santoso~\mbox{%DIFAUXCMD
\cite{santoso_probing_2009} }%DIFAUXCMD
and Kalinin~\mbox{%DIFAUXCMD
\cite{Kalinin2010}
}%DIFAUXCMD
extended the PDA approach to estimate conversion rates between different
states by comparing FRET histograms as a function of the time-bin size.
In addition, Gopich and Szabo~\mbox{%DIFAUXCMD
\cite{Gopich2009, gopich_theory_2011} }%DIFAUXCMD
developed
a related method to compute conversion rates using
a likelihood function which depends on photon timestamps (overcoming
the time binning and FRET histogramming step and directly applicable
to freely-diffusing data).
In case of measurement including lifetime, the multiparameter fluorescence
detection (MFD) method allows to identify dynamics from the deviation
from the linear relation between lifetime and E~\mbox{%DIFAUXCMD
\cite{sisamakis_accurate_2010}}%DIFAUXCMD
.
Hoffman~\mbox{%DIFAUXCMD
\cite{hoffmann_quantifying_2011} }%DIFAUXCMD
proposed a method
called RASP (recurrence analysis of single particles) to extend
the timescale of detectable kinetics.
Hoffman computes the probability that two nearby bursts are due to
the same molecule and therefore allows setting a time-threshold
for considering consecutive bursts as the same single-molecule event.
}
\DIFadd{Other interesting approaches include combining smFRET and FCS
for detecting and quantify kinetics on timescales much shorter
than the diffusion
time~\mbox{%DIFAUXCMD
\cite{laurence_correlation_2007,torres_measuring_2007,nettels_unfolded_2008}}%DIFAUXCMD
.
In addition, Bayes-based methods have been proposed to fit static
populations~\mbox{%DIFAUXCMD
\cite{devore_classic_2012,murphy_bayesian_2014}}%DIFAUXCMD
, or to study dynamics~\mbox{%DIFAUXCMD
\cite{kou_bayesian_2005}}%DIFAUXCMD
.
}
\DIFadd{Finally, two related methods for discriminating between static heterogeneity
and sub-millisecond dynamics are Burst Variance Analysis
(BVA) proposed by Torella~\mbox{%DIFAUXCMD
\cite{Torella_2011} }%DIFAUXCMD
and
kernel density distribution estimator (2CDE) proposed by
Tomov~\mbox{%DIFAUXCMD
\cite{Tomov_2012}}%DIFAUXCMD
. The BVA method is described in the next section.
The 2CDE method, which has been implemented in FRETBursts, computes local
photon rates from timestamps within bursts using
Kernel Density Estimation (KDE)
(FRETBursts includes general-purpose functions
to compute KDE of photon timestamps in the }\verb|phrates| \DIFadd{module,
(}\href{http://fretbursts.readthedocs.io/en/latest/phrates.html}{link}\DIFadd{)).
From time variations of local rates is possible to
detect the occurrence of dynamics. In particular the 2CDE method
builds, for each burst, a quantity $(E)_D$ (or $(1-E)_A$) which is equal
to the burst average $E$ when no dynamics is present, but it is biased
toward an higher (or lower) value in presence of dynamics. From these
quantities a burst ``estimator''
(called FRET-2CDE) is derived. For a user the 2CDE method consists
in plotting the 2-D histogram of $E$ versus FRET-2CDE
in assessing the vertical position of the various populations:
populations centered around FRET-2CDE=10 have
no dynamics while population biased towards higher FRET-2CDE values
have dynamics.
}
\DIFadd{The BVA and 2CDE methods are implemented
in two notebooks included with FRETBursts
(}\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/Example%20-%20Burst%20Variance%20Analysis.ipynb}{BVA link},
\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/Example%20-%202CDE%20Method.ipynb}{2CDE link}\DIFadd{).
To use them, a user needs to download the relevant notebook
and run the anaysis therein.
The other methods mentioned in this section are not currently
implemented in FRETBursts.
However, users can implement their additional favorite method
taking advantage of FRETBursts functions for burst analysis
and timestamps/bursts manipulation.
To facilitate this task, in the next section,
we show how to perform low-level analysis of timestamps and bursts data
by implementing the BVA method from scratch.
An additional example showing how to split bursts in constant time-bins
can be found in the respective FRETBursts notebook
(}\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/Example%20-%20Working%20with%20timestamps%20and%20bursts.ipynb}{link}\DIFadd{).
These examples serve as a guide for implementing new methods.
We welcome researchers willing to implement new methods to ask questions
on GitHub or on the mailing list.
We also encourage sharing eventual new methods implemented in FRETBursts
for the benefit the entire community.
}
\section*{Implementing Burst Variance Analysis}
\label{sec:bva}
In this section, we describe how to implement burst variance analysis (BVA)
as described in~\cite{Torella_2011}.
FRETBursts provides well-tested, general-purpose functions for timestamps and burst data
manipulation and therefore simplifies implementing custom burst analysis algorithms such as BVA.
\subsection*{BVA Overview}
\DIFdelbegin \DIFdel{Single-molecule FRET histograms show more information than just mean FRET efficiencies.
While in general the presence of several peaks clearly indicates the existence of
multiple subpopulations, a single peak cannot a priori be associated with
a single population defined by a unique FRET efficiency without further analysis
(such as, for instance, shot-noise analysis~\mbox{\cite{Nir_2006,Antonik2006}}).}
\DIFdel{The FRET histogram of a single FRET population has a minimum width set by shot noise
(i.e. the width is caused by the statistics of discrete photon-detection events).
FRET distributions broader than the shot noise limit,
can be ascribed to either a static mixture of species with slightly different FRET efficiencies,
or to a specie undergoing dynamic transitions (e.g. interconversion between multiple states,
diffusion in a continuum of conformations, binding-unbinding events, etc.).
When the single peak of a FRET distribution is wider than predicted from shot-noise,
it is not possible to discriminate between the static and dynamic case without further analysis.}\DIFdelend
The BVA method has been developed to \DIFdelbegin \DIFdel{address this issue, namely identifying }\DIFdelend \DIFaddbegin \DIFadd{identify }\DIFaddend the presence of dynamics
in FRET distributions~\cite{Torella_2011},
and has been successfully applied to identify biomolecular processes with
dynamics on the millisecond time-scale~\cite{Torella_2011, Robb_2013}.
The basic idea behind BVA is to subdivide bursts into contiguous burst chunks (sub-bursts)
comprising a fixed number $n$ of photons,
and to compare the empirical variance of acceptor counts of all sub-bursts in a burst,
with the theoretical shot-noise-limited variance.
An empirical variance of sub-bursts larger than the shot-noise limited value indicates
the presence of dynamics. Since the estimation of the sub-bursts variance is affected
by uncertainty, BVA analysis provides and indication of an higher or lower probability
of observing dynamics.
In a FRET (sub-)population originating from a single static FRET efficiency,
the sub-bursts acceptor counts $n_a$ can be modeled as a binomial-distributed random variable
$N_a \sim \operatorname{B}(n, E_p)$, where $n$ is the number of photons in each sub-burst and
$E_p$ is the estimated population proximity-ratio (PR).
Note that we can use the PR because, regardless of the molecular FRET efficiency,
the detected counts are partitioned between donor and acceptor channels according to
a binomial distribution with success probability equal to the PR.
The only approximation done here is neglecting the presence of background
(a reasonable approximation since the backgrounds counts are in general a
very small fraction of the total counts).
We refer the interested reader to~\cite{Torella_2011} for further discussion.
If $N_a$ follows a binomial distribution, the random variable $E_{\textrm{sub}} = N_a/n$,
has a standard deviation reported in eq.~\ref{eq:binom_std}.
\begin{equation}
\label{eq:binom_std}
\operatorname{Std}(E_{\textrm{sub}}) = \left( \frac{E_p\,(1 - E_p)}{n} \right)^{1/2}
\end{equation}
BVA analysis consists of four steps: 1) dividing bursts into consecutive sub-bursts
containing a constant number of consecutive photons~\textit{n}, 2) computing the PR
of each sub-burst, 3) calculating the empirical standard deviation ($s_E$) of sub-bursts
PR in each burst, and 4) comparing $s_E$ to the expected standard deviation
of a shot-noise-limited distribution~(eq.~\ref{eq:binom_std}).
If, as in figure~\ref{fig:bva_static}, the observed FRET efficiency distribution
originates from a static mixture of sub-populations (of different
non-interconverting molecules) characterized by distinct FRET efficiencies,
$s_E$ of each burst is only affected by shot-noise and will follow the expected
standard deviation curve based on eq.~\ref{eq:binom_std}.
Conversely, if the observed distribution originates from biomolecules belonging to a single specie,
which interconverts between different FRET sub-populations (over times comparable to the diffusion
time), as in figure~\ref{fig:bva_dynamic}, $s_E$ of each burst will be larger than the expected
shot-noise-limited standard deviation, and will be located above the shot-noise standard
deviation curve (right panel of figure~\ref{fig:bva_dynamic}).
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.98\columnwidth]{figures/ALEX_BVA_static/ALEX_BVA_static}
\caption{\label{fig:bva_static} \textbf{BVA distribution for a static mixture sample.}
The left panel shows the E-S histogram for a mixture of single stranded DNA (20dT) and double stranded DNA (20dT-20dA) molecules in 200 mM MgCl$_2$. The right panel shows the corresponding BVA plot. Since both 20dT and 20dT-20dA are stable and have no dynamics, the BVA plots shows $s_E$ peaks lying on the static standard deviation curve (\textit{red curve}).%
}
\end{center}
\end{figure}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.98\columnwidth]{figures/ALEX_BVA_dynamic/ALEX_BVA_dynamic}
\caption{\label{fig:bva_dynamic} \textbf{BVA distribution for a hairpin sample undergoing dynamics.}
The left panel shows the E-S histogram for a single stranded DNA sample ($A_{31}$-TA, see in~\cite{Tsukanov_2013}), designed to form a transient hairpin in 400mM NaCl. The right panel shows the corresponding BVA plot. Since the transition between hairpin and open structure causes a significant change in FRET efficiency, $s_E$ lies largely above the static standard deviation curve (\textit{red curve}).%
}
\end{center}
\end{figure}
\subsection*{BVA Implementation}
The following paragraphs describe the low-level details involved in implementing the BVA using FRETBursts.
The main goal is to illustrate a real-world example of accessing and manipulating timestamps and burst data.
For a ready-to-use BVA implementation users can refer to the corresponding notebook included with FRETBursts
(\href{http://nbviewer.jupyter.org/github/tritemio/FRETBursts_notebooks/blob/master/notebooks/Example%20-%20Burst%20Variance%20Analysis.ipynb}{link}).
\paragraph*{Python details}
For BVA implementation, two photon streams are needed: all-photons during donor excitation (\DIFdelbegin \DIFdel{Dex}\DIFdelend \DIFaddbegin \DIFadd{D\textsubscript{ex}}\DIFaddend )
and acceptor photons during donor excitation (\DIFdelbegin \DIFdel{DexAem}\DIFdelend \DIFaddbegin \DIFadd{D\textsubscript{ex}A\textsubscript{em}}\DIFaddend ).
These photon stream selections are obtained by computing boolean masks as follows
(see\DIFdelbegin \DIFdel{section}\DIFdelend ~\nameref{sec:burststimes}):
\begin{lstlisting}
Dex_mask = ds.get_ph_mask(ph_sel=Ph_sel(Dex='DAem'))
DexAem_mask = ds.get_ph_mask(ph_sel=Ph_sel(Dex='Aem'))
DexAem_mask_d = AemDex_mask[Dex_mask]
\end{lstlisting}
Here, the first two variables (\verb|Dex_mask| and \verb|DexAem_mask|)
select photon from the all-photons timestamps array,
while \verb|DexAem_mask_d|, selects A-emitted photons from the
array of photons emitted during D-excitation. As shown below,
the latter is needed to count acceptor photons in burst chunks.
Next, we need to express bursts start-stop data as indexes of the D-excitation photon stream
(by default burst start-stop indexes refer to all-photons timestamps array):
\begin{lstlisting}
ph_d = ds_FRET.get_ph_times(ph_sel=Ph_sel(Dex='DAem'))
bursts = ds_FRET.mburst[0]
bursts_d = bursts.recompute_index_reduce(ph_d)
\end{lstlisting}
Here, \verb|ph_d| contains the \DIFdelbegin \DIFdel{Dex }\DIFdelend \DIFaddbegin \DIFadd{D\textsubscript{ex} }\DIFaddend timestamps, \verb|bursts| the original burst data and
\verb|bursts_d| the burst data with start-stop indexes relative to \verb|ph_d|.
Finally, with the previous variables at hand, the BVA algorithm
can be easily implemented by computing the $s_E$ quantity for each burst:
\begin{lstlisting}
n = 7
E_sub_std = []
for burst in bursts_d:
E_sub = []
startlist = range(burst.istart, burst.istop + 2 - n, n)
stoplist = [i + n for i in startlist]
for start, stop in zip(startlist, stoplist):
A_D = DexAem_mask_d[start:stop].sum()
E = A_D / n
E_sub.append(E)
E_sub_std.append(np.std(E_sub))
\end{lstlisting}
Here, \verb|n| is the BVA parameter defining the number of photons in each burst chunk.
The outer loop iterates through bursts, while the inner loop iterates through sub-bursts.
The variables \verb|startlist| and \verb|stoplist| are the list of start-stop indexes for
all sub-bursts in current burst.
In the inner loop, \verb|A_D| and \verb|E| contain the number of acceptor photons and
FRET efficiency for the current sub-burst. Finally, for each burst, the standard deviation
of \verb|E| is appended to the list \verb|E_sub_std|.
By plotting the 2D distribution of $s_E$ (i.e. \verb|E_sub_std|) versus the average (uncorrected) E
we obtain the BVA plots of figure~\ref{fig:bva_static} and~\ref{fig:bva_dynamic}.
\section*{Conclusions}
\label{sec:conclusions}
FRETBursts is an open source and openly developed (see~\nameref{sec:dev}) implementation % SI_link
of established smFRET burst analysis methods
made available to the single-molecule community.
It implements several novel concepts which improve the analysis results, such as
time-dependent background estimation, background-dependent burst search threshold,
burst weighting and $\gamma$-corrected burst size selection.
More importantly, FRETBursts provides a library of thoroughly-tested functions
for timestamps and burst manipulation, making it an ideal platform for
developing and comparing new analytical techniques.
We envision FRETBursts both as a state-of-the-art burst analysis
software as well as a platform for development and assessment of novel algorithms.
To underpin this envisioned role, FRETBursts is developed following modern
software engineering practices, such as DRY principle
(\href{http://en.wikipedia.org/wiki/Don\%27t_repeat_yourself}{link})
to reduce duplication and KISS principle
(\href{http://en.wikipedia.org/wiki/KISS_principle}{link})
to reduce over-engineering. Furthermore, to minimize the number software errors~\cite{Merali_2010,Soergel_2015},
we employ defensive programming~\cite{Prli__2012} which includes code readability,
unit and regression testing and continuous integration~\cite{Eglen_2016}.
Finally, being open source, any scientist can inspect the source code,
fix errors, adapt it to her own needs.
We believe that, in the single-molecule community,
standard open source software implementations, such as FRETBursts, can enhance
reliability and reproducibility of analysis and promote a faster adoption of novel methods,
while reducing the duplication of efforts among different groups.
\section*{Acknowledgments}
We thank Dr. Eyal Nir and Dr. Toma Tomov for support in the implementation of the 2CDE method \DIFdelbegin \DIFdel{.
}\DIFdelend \DIFaddbegin \DIFadd{and Dr. Achilles Kapanidis and Dr. Nicole Robb for providing
experimental data for testing the BVA implementation.
}\DIFaddend This work was supported by National Institutes of Health (NIH)
grant R01-GM95904 and R01-GM069709. Dr. Weiss discloses equity in
Nesher Technologies and intellectual property used in the research
reported here. The work at UCLA was conducted in Dr. Weiss's Laboratory.
\section*{Supporting Information}
\paragraph*{S1 Appendix.}
\label{sec:notebook}
{\bf Notebook Workflow.} A description of the notebook workflow used by FRETBursts.
\paragraph*{S2 Appendix.}
\label{sec:dev}
{\bf Development and Contributions.} A description of development philosophy and techniques
as well as how to contribute to the FRETBursts project.
\paragraph*{S3 Appendix.}
\label{sec:burststimes}
{\bf Timestamps and Burst Data.} General concepts of how timestamps and
bursts data are stored and handled in FRETBursts.
\paragraph*{S4 Appendix.}
\label{sec:plotting}
{\bf Plotting \texttt{Data}.} A description of the syntax used to perform
plots in FRETBursts \DIFaddbegin \DIFadd{and of the 2-D hexagonal-bin histogram used in E-S plots}\DIFaddend .
\paragraph*{S5 Appendix.}
\label{sec:bg_opt_th}
{\bf Background Estimation With Optimal Threshold.} A description of
the algorithm used by FRETBursts to compute the
optimal threshold for background estimation.
\paragraph*{S6 Appendix.}
\label{sec:burstweights_theory}
{\bf Burst Weights.} Theory underpinning the choice of using burst size
as weights for FRET estimation.
\nolinenumbers
\bibliography{bibliography/converted_to_latex.bib%
}
\end{document}