Authorea

Antonino Ingargiola Local latex building over 9 years ago

Commit id: c47070f796aab25070f0216b465b62897c89f2f8

deletions | additions

*.aux *.pdf *.bbl *.blg *.log *.out *.gz

Finally, it is possible to use a heuristic estimation of the threshold using \verb|tail_min_us='auto'|. For more details refer to the \href{http://fretbursts.readthedocs.org/en/latest/data_class.html#fretbursts.burstlib.Data.calc_bg}{\texttt{calc\_bg} \href{http://fretbursts.readthedocs.org/en/latest/data\_class.html#fretbursts.burstlib.Data.calc\_bg}{\texttt{calc\_bg} documentation}. \subsubsection{Error metric and optimal threshold} The functions fitted to the background also return an estimation of the quality of fit computed as the distance between the empirical \href{http://en.wikipedia.org/wiki/Cumulative_distribution_function}{cumulative \href{http://en.wikipedia.org/wiki/Cumulative\_distribution\_function}{cumulative distribution function} (CDF) and fitted CDF. Two different distance metrics can be returned. The first is \href{http://en.wikipedia.org/wiki/Kolmogorov\%E2\%80\%93Smirnov_test}{Kolgomorov-Smirnov} \href{http://en.wikipedia.org/wiki/Kolmogorov\%E2\%80\%93Smirnov\_test}{Kolgomorov-Smirnov} statistics (the maximum of the difference between the empirical and the fitted CDF) and the second is the \href{http://en.wikipedia.org/wiki/Cram\%C3\%A9r\%E2\%80\%93von_Mises_criterion}{Cramer \href{http://en.wikipedia.org/wiki/Cram\%C3\%A9r\%E2\%80\%93von\_Mises\_criterion}{Cramer von Mises} statistics corresponding to the integral of the squared residuals (see the code \href{https://github.com/tritemio/FRETBursts/blob/master/fretbursts/background.py#L40}{here}). In principle, one can find the threshold as the value that minimize the error metric. This approach is implemented by the function \href{http://fretbursts.readthedocs.org/en/latest/plugins.html#fretbursts.burstlib_ext.calc_bg_brute}{calc\_bg\_brute} \href{http://fretbursts.readthedocs.org/en/latest/plugins.html#fretbursts.burstlib\_ext.calc\_bg\_brute}{calc\_bg\_brute} in the \href{http://fretbursts.readthedocs.org/en/latest/plugins.html}{burstlib\_ext module}. For more information see this notebook[TODO].

\subsubsection{Introduction to burst search} \label{sec:burstsearch_intro} After background estimation, the burst search is the next fundamental step of the analysis. The core "sliding window" algorithm, proposed by Eggeling~\textit{et al.} in 1998~\cite{Eggeling_1998}, involves searching for bursts of photons in which $m$ consecutive photons are contained within a minimal time period $\Delta t$. In other words, bursts are portions of the photon stream where the local rate (computed using $m$ photons) is above a minimal rate chosen as a threshold. Eggeling did not provide any criteria on how to choose the rate threshold and the number of photons $m$ and as therefore it has become a common practice to manually tweak those parameters for each specific measurement. A more general approach consist in taking into account the background rate of the specific measurements and in choosing a rate threshold that is $F$ times larger than the background rate. This approach assures that the resulting bursts all have a single-to-background ratio (SBR) larger than $(F-1)$~\cite{Michalet_2012}. A consistent criterion to choose the threshold is very important when comparing different measurements with different background rates, when the background significantly changes during the measurements or in multi-spot measurements where each spot has a different background rate. A second important aspect of burst search is which photon stream is processed. Sometimes, for instance when identifying FRET populations, one would like to apply the burst search to all the photons. Other times, when focusing on donor-only or acceptor only populations, it is better to use only the donor or acceptor signal. In general one would like to be able to apply the burst search to an arbitrary selection of photons. In FRETBursts this can be achieved passing the appropriate \verb|Ph_sel| object to the burst search method (see section~\ref{sec:ph_streams} for more info on photon stream definitions). Finally, Nir~\textit{et al.}~\cite{Nir_2006} proposed a dual-channel burst search (DCBS) that allows one to mitigate (to some extent) artifacts due to photo-physical effects such as blinking. In this case a search is performed independently on two photon streams and bursts are marked only when both photon streams exhibit a rate higher than the threshold, implementing a kind of an AND-gate logic. Usually, the term DCBS is refers to a burst search where the two photon streams are all the photons during donor excitation (\verb|Ph_sel(Dex='DAem')|) and acceptor channel photons during acceptor excitation (\verb|Ph_sel(Aex='Aem')|). After the first level of burst search is performed it is important to select bursts according to their number of photons (burst size). In the most rudimentary form this selection can be perfomed during burst search discarding all the bursts with size lower that a threshold $L$. This method, however, neglects the effect of background and gamma factor on the burst size and can lead to a selection bias of certain channels or of certain sub-populations. For this reason we advocate performing a burst size selection after background correction and taking into account the gamma factor, as illustrated in section~\ref{sec:burstsel}. \subsubsection{Burst search in FRETBursts} \label{sec:burstsearch_code} In FRETBursts the standard burst search is performed calling the \href{http://fretbursts.readthedocs.org/en/latest/data_class.html#fretbursts.burstlib.Data.burst\_search}{\verb|burst_search| \href{http://fretbursts.readthedocs.org/en/latest/data\_class.html#fretbursts.burstlib.Data.burst\_search}{\texttt{burst\_search} method}. \begin{verbatim} d.burst_search(F=6, m=10, ph_sel=Ph_sel('all')) \end{verbatim} The previous command perfoms a burst search on all photons (\verb|ph_sel=Ph_sel('all')|), with a minimum rate 6 times larger than the background rate (\verb|F=6|) and using 10 consecutive photons to compute the local rate (\verb|m=10|). A different photon selection, threshold ($F$) or number of photons for rate computation $m$ can be selected by passing a different value. These parameters are generally a good starting point for smFRET analysis but can be tweaked in specific cases. Note that in the previous burst search no burst size selection is performed (i.e. the minimum bursts size is $m$). An additional paramenter parameter $L$ can be passed to apply a threshold on the raw burst size (before any correction). We however suggest to perform a more accurate burst size selection as shown in the next section~\ref{sec:burstsel}. In us-ALEX there are 3 important correction parameters: gamma factor, spectral leakage and acceptor direct excitation~\cite{Lee_2005}. These corrections can be applied by simply setting the respective Data attributes: \begin{verbatim}

d.dir_ex = 0.08 \end{verbatim} These attributes can be assigned either before or after the burst search. In the latter case, the burst data is automatically updated using the newly assigned correction values. Sometimes it may be useful to specify a fixed threshold, instead of a threshold derived from the background rate like in the previous example. In this case, instead of $F$ we can use the argument \verb|min_rate_cps| that specifies a threshold in Hertz. For example, a burst search with a 50~kHz threshold can be perfoemed as follows: \begin{verbatim} d.burst_search(min_rate_cps=50e3, m=10, ph_sel=Ph_sel('all')) \end{verbatim} Finally, to perform a DCBS burst search (or in general an AND-gate burst search, see section~\ref{sec:burstsearch_intro}) the plugin function \href{http://fretbursts.readthedocs.org/en/latest/plugins.html#fretbursts.burstlib_ext.burst_search_and_gate}{\verb|burst_search_and_gate|} \href{http://fretbursts.readthedocs.org/en/latest/plugins.html#fretbursts.burstlib\_ext.burst\_search\_and\_gate}{\texttt{burst\_search\_and\_gate}} can be used like in the following example: \begin{verbatim} d_dcbs = bext.burst_search_and_gate(d, F=6, m=10) \end{verbatim} Note that in this case a new \verb|Data| variable is returned (\verb|d_dcbp|) containing all the data and the results of the new burst search. In order to save RAM, FRETBursts shares the timestamps and detectors arrays between different copies of a \verb|Data| object (for example \verb|d| and \verb|d_dcbs|), while all the other data (including background and burst data) is copied. The function \verb|burst_search_and_gate| accepts additional arguments \verb|ph_sel1| and \verb|ph_sel2| used to specify different photons streams. The default values (\verb|ph_sel1 = Ph_sel(Dex='DAem')| and \verb|ph_sel2 = Ph_sel(Aex='Aem')|) correspond to the classical DCBS (see section~\ref{sec:burstsearch_intro}).

\subsection{Burst selection} \label{sec:burstsel} After burst search it is common to select bursts according to different criteria, among which one of the most common is the burst size. For example, to select bursts with more than 100 photons (after background correction) detected during the donor excitation periods we can write: \begin{verbatim} ds = Sel(d, select_bursts.size, th1=100) \end{verbatim} In the previous command a new Data variable (\verb|ds|) containing the selected bursts is created. As mentioned before the new object will share the photon data arrays with the original object (\verb|d|) in order to minimize the RAM consumption. Looking at the previous command, we notice that the \href{http://fretbursts.readthedocs.org/en/latest/burst_selection.html#fretbursts.burstlib.Sel}{\verb|Sel|} \href{http://fretbursts.readthedocs.org/en/latest/burst\_selection.html#fretbursts.burstlib.Sel}{\\texttt{Sel}} function requires a "selection criterium" (a python function) as second argument; all the remaining arguments are passed to the selection function. The module \verb|select_bursts| contains numerous built-in selection functions, for example to select a region on the E-S ALEX histogram (\verb|select_bursts.ES|), to select bursts based on their duration (\verb|select_bursts.width|) and so on. New criteria can be easily implemented by defining a new selection function, usually not longer than a couple of lines (see the \href{https://github.com/tritemio/FRETBursts/blob/master/fretbursts/select_bursts.py}{\verb|select_bursts| \href{https://github.com/tritemio/FRETBursts/blob/master/fretbursts/select\_bursts.py}{\texttt{select\_bursts} module} for several examples). Finally note that different criteria can be combined by applying them in sequence. For example with the following commands

dsw = Sel(ds, select_bursts.width, th1=0.5e-3, th2=3e-3) \end{verbatim} the variable \verb|dsw| will contain all the bursts with sizes between 50 and 200 photons, with duration between 0.5 and 3~ms. \subsubsection{Burst size selection} In the previous section we used a definition of "burst size" as the total number of detected counts in the donor and in the acceptor channel during donor excitation periods. We can modify the selection command in order to also include photons detected in the acceptor channels during acceptor excitation periods. This is achieved passing the boolean flag \verb|add_naa=True| to the selection function as follows: \begin{verbatim} ds = Sel(d, select_bursts.size, th1=100, add_naa=True) \end{verbatim} Another important parameter in defining the burst size is the gamma-factor, i.e. the unbalance between the donor and the acceptor channels. The gamma-factor is used to correct for the different quantum yield between D and A fluorophores and the different photon-detection efficiency between the D and A channels. Neglecting the effect of gamma-factor on the burst size leads to a biased burst selection, especially if $\gamma$ significantly differ from 1. To include the effect of $\gamma$ on the burst size and obtain a "fair" burst selection (i.e. a selection that does not favor high or low FRET states) we need to pass the argument \verb|gamma| (or \verb|gamma1|) like in the following example: \begin{verbatim} ds = Sel(d, select_bursts.size, th1=100, gamma=0.65) \end{verbatim} For more information on burst size selection refer to the \href{http://fretbursts.readthedocs.org/en/latest/burst_selection.html#fretbursts.select_bursts.size}{\verb|select_bursts.size| \href{http://fretbursts.readthedocs.org/en/latest/burst_selection.html#fretbursts.select\_bursts.size}{\texttt{select\_bursts.size} documentation}.

\section{Architecture and concepts} In this section we introduce some general concepts and naming conventions in FRETBursts. \subsection{Photon streams} \label{sec:ph_streams} The fundamental data at the core of smFRET experiments is the array of photon arrival timestamps, with a resolution of the order of 10~ns. In single-spot measurements, all the timestamps are stored in a single array. In multi-spot measurements we have as many timestamps arrays as the number of excitation spots. Each array of timestamps contains timestamps from both the donor (D) and the acceptor channel (A). In ALEX measurements, we can further differentiate between photons emitted during D and A excitation periods. In FRETBursts the different selections of photons/timestamps are called "photon streams" and they are specified with a \href{http://fretbursts.readthedocs.org/en/latest/ph_sel.html}{\texttt{Ph\_sel} object} . In non-ALEX smFRET data we have 3 base photon streams (table~\ref{tab:ph_sel_smfret}), while in ALEX data we have 5 base photon streams (table~\ref{tab:ph_sel_alex}). The \href{http://fretbursts.readthedocs.org/en/latest/ph_sel.html}{\texttt{Ph\_sel} class} allows to express any combination of photon streams. For example, in ALEX measurements, the D-emission during A-excitation stream is usually excluded because it does not contain any useful signal~\cite{Lee_2005}. To indicate all but the photons in this photon stream we write \verb|Ph_sel(Dex='DAem', Aex='Aem')|, that litteraly means \textit{select donor and acceptor photons (DAem) during donor excitation (Dex) and only acceptor photons (Aem) during acceptor excitation (Aex)}. \begin{table} \begin{tabular}{l|l}

\subsection{Background definitions} \label{sec:bg_intro} Even when no molecule is crossing the excitation volume, there are “background counts” due to detectors dark counts, out of focus molecules and sample scattering and/or auto-fluorescence. Figure~\ref{fig:bgdist} shows the typical distribution of timestamps delays (i.e. the waiting times between two subsequent timestamps) in a smFRET measurement. The “tail” of the distribution (a line in semi-log scale) corresponds to exponentially-distributed delays, indicating that those counts are generated by a \href{http://en.wikipedia.org/wiki/Poisson_process}{Poisson process}. At short timescales, the distribution departs from the exponential due to the bursts of photons from diffusing single-molecules (the signal). To estimate the background rate, (i.e. the exponential time constant) we need to select a minimal timestamp delay threshold above which the distribution can be considered exponential. We also need to chose a fitting method, for example the Maximum Likelihood Estimation (MLE) or a curve fit of the histogram via non-linear least squares (NLSQ). Both burst search and burst correction require background rates for all the different photon streams. Furthermore, we want to estimate the background periodically (every few seconds) because it can varies during the measurement on time scales of tens of seconds. FRETBursts splits the data in uniform time slices called \textit{background periods} and compute the background rates for each of these slices (see section~\ref{sec:bg_calc}). The slicing in background periods is also used during burst search to compute a background-dependent threshold and to apply the burst correction (section~\ref{sec:burstsearch}). \subsection{The \texttt{Data} class} \label{sec:data_intro} The \href{http://fretbursts.readthedocs.org/en/latest/data_class.html}{\texttt{Data} class} is the fundamental data container in FRETBursts. It contains the measurement data and provides several methods for data analysis (background estimation, burst search, etc...). It also stores all the analysis results (bursts data, estimated parameters). All the arrays in Data are contained in lists whose length is equal to the number of excitation spots. This means that for single-spot measurements all the arrays are wrapped in 1-element lists. For example, the bursts data field \verb|Data.mbursts| will be a 1-element list and \verb|Data.mbursts[0]| will be the actual numpy array of burst data. \verb|Data|implements a shortcut syntax that allows accessing \verb|Data.mbursts[0]| as \verb|Data.mbursts_| (valid for all the fields). As an example the following are some important burst-data fields: \begin{itemize} \item \verb|nd|: number of photons detected through the donor channel (during donor excitation), after correction \item \verb|na|: number of photons detected through the acceptor channel (during donor excitation), after correction \item \verb|naa|: number of photons detected through the acceptor channel during acceptor excitation, after correction \end{itemize} \subsection{Plotting "Data"} FRETBursts uses matplotlib~\cite{2096e2a4-8f50-4519-bfb3-f796da201630} to provide a wide range of built-in plot functions for \verb|Data| objects. The plot sysntax is the same both for single and multi-spot measurements. Almost all the plot commands are called through the wrapper function \verb|dplot|, for example to plot a timetrace of the photon data we type: \begin{verbatim} dplot(d, timetrace) \end{verbatim} The function \verb|dplot| is the generic plot function that creates the figure and handles details common to all the plotting functions (i.e. the title). \verb|d| is the \verb|Data| variable and \verb|timetrace| is the actual plot function that operates on a single channel. In multi-spot measurements \verb|dplot| creates one subplot for each spot and calls \verb|timetrace| for each channel. All the built-in plot functions that can be passed to \verb|dplot| are defined in the \verb|burst_plot| module. When importing fretbursts all the plot functions are also imported. To make easy to find plot function through auto-completion, all the plot functions names start with a prefix indicating the plot type. The plot names prefixes are: \verb|timetrace| for binned timetraces of photon data, \verb|ratetrace| for rates of photons as a function of time (non binnned), \verb|hist| for functions plotting histograms and \verb|scatter| for scatter plots. Additional plots can be manually created directly with matplotlib. Usually plots are displayed inline in the notebook. However a few plot functions such as \verb|timetrace| and \verb|hist2d_alex| have interactive features that can be enabled when using the QT4 backend that open the plot in an external window. It is possible to switch backend from inline to QT and vice versa using the ipython commands \verb|%matplotlib qt| and \verb|%matplotlib inline|. For example, after switching to the QT4 backend the %the following commads: \begin{verbatim} dplot(d, timetrace, scroll=True) \end{verbatim} opens a new window with a timetrace plot and an horizontal scrollbar for quick "scrolling" throughout the measurement. Similarly, calling the \verb|hist2d_alex| function with the QT4 backend allows selecting an area on the E-S histogram using the mouse. \begin{verbatim} dplot(ds, hist2d_alex, gui_sel=True) \end{verbatim} The values that identify the region are printed and can be copied an pasted as argument for the burst sealection function \verb|select_bursts.ES| (see section~\ref{sec:burstsel}).

\subsection{smFRET and burst analysis} FRETBursts is a python package for burst analysis of confocal single-molecule FRET (smFRET) smFRET) data. \textit{Expand abstract to introduce smFRET and what is burst analysis}. \subsection{Installing FRETBursts} FRETBursts is a standard python package that requires the "scipy stack", a set of core scientific python packages. The "scipy stack" is easily installed through a free scientific python distribution such as Continuum Anaconda, although some users may prefer another distribution or a manual installation. FRETBursts can be installed through the standard python package manager (PIP) with the command \texttt{pip install fretbursts}. Alternatively the latest development version can be installed from GitHub. For more information on different installation methods see the \href{http://fretbursts.readthedocs.org/en/latest/installation.html}{FRETBursts documentation}. \subsection{Executing FRETBursts} In general, we suggest to import FRETbursts with the expression:

>>> import fretbursts as fb \end{verbatim} that will make available all the FRETBursts functions with a concise `fb.` prefix. In this article, however, we assume that FRETBursts is imported with the shortcut form: \begin{verbatim} >>> from fretbursts import * \end{verbatim} that allows to skip the \verb|fb.| prefix and also imports some common numeric libraries (numpy and matplotlib.pyplot imported as \verb|np| and \verb|plt| respectively). Furthermore we encourage using FRETBursts through the IPython Notebook environment. All the FRETBursts tutorials are ipython notebook documents and, indeed, a quick way to start a new analysis is copying a pre-existing FRETBursts notebook and modifying it. Furthermore we encourage using FRETBursts through The "notebook workflow"\cite{Shen_2014} has the IPython Notebook environment. All advantage of automatically recording all theFRETBursts tutorials are ipython notebook documents and, indeed, a quick way to start a new analysis is copying a pre-existing FRETBursts notebook steps including data file names, software versions, analysis details and modifying it. the full output (figures, tables, etc...). The full, reproducible analysis becomes a document The "notebook workflow"\cite{Shen_2014} has the advantage of automatically recording all the analysis steps including data file names, software versions, analysis details and the full output (figures, tables, etc...). The full, reproducible analysis becomes a document

\input{preamble} \usepackage{hyperref} \usepackage{listings} \bibliographystyle{plain} \author{Antonino Ingargiola} \title{\input{title}} \begin{document} \maketitle \input{"figures/ph_delays_distrib1/caption"} \begin{abstract} \input{Abstract.tex} \end{abstract} \input{Introduction.tex} \input{Concepts.tex} \input{"Loading data"} \input{"Background estimation"} \begin{figure} \includegraphics{"figures/ph_delays_distrib1/ph_delays_distrib1"} \caption[]{\input{"figures/ph_delays_distrib1/caption"}} \end{figure} \input{"Burst search.tex"} \input{"Burst selection"} \input{Fitting} \input{Conclusions} \bibliography{bibliography/biblio} \end{document}