AbstractAccurate quantification of cellular and mitochondrial bioenergetic activity is of great interest in many medical and biological areas. Mitochondrial stress experiments performed with Seahorse Bioscience XF Analyzers allow estimating 6 bioenergetics measures: basal respiration, ATP production, proton leak, maximal respiration, spare capacity, and non-mitochondrial respiration by monitoring oxygen consumption rates (OCR) of living cells in multi-well plates. However, detailed statistical analyses of OCR measurements from XF Analyzers have been lacking so far. Here, we performed 126 mitochondrial stress experiments involving 206 fibroblast cell lines. We show that the noise of OCR is multiplicative and that outlier data points can concern individual measurements or all measurements of a well. Based on these insights, we developed a novel statistical method, OCR-Stats, that: i) models multiplicative noise, ii) takes into account replicates both within and between plates, and iii) automatically identifies outlier data points and outlier wells. This led to a significant reduction of the coefficient of variation across experiments of basal respiration, by 36% (P = 0.004), and of maximal respiration, by 32% (P = 0.023). After analyzing the inter and intra-plate variances, we suggest a minimum number of well and plate replicates needed in order to obtain confident results. We proposed two different approaches to compare the OCR levels of two samples, which depend on the experimental design. In this context, OCR-Stats largely increases the sensitivity over the state-of-the-art method. OCR-Stats can easily be applied for analyzing the extracellular acidification rates (ECAR) assessed by the glycolysis stress test also provided by the XF Analyzer.Keywords:Oxygen Consumption Rate (OCR), mitochondrial respiration, bioenergetics, statistical testing, outlier detection.IntroductionMitochondria are double membrane enclosed, ubiquitous, maternally inherited, cytoplasmic organelles present in most eukaryotic organisms \cite{Gorman_Chinnery_DiMauro_Hirano_Koga_McFarland_Suomalainen_Thorburn_Zeviani_Turnbull_2016}al.,. 2016). They are the bioenergetics factories of the cell \cite{Bhola_Letai_2016,Sun_Youle_Finkel_2016}, and are also involved in regulating reactive oxygen species \cite{Wallace_2007}, apoptosis \cite{Bhola_Letai_2016}, amino acid synthesis \cite{Birsoy_Wang_Chen_Freinkman_Abu-Remaileh_Sabatini_2015,Sullivan2015a}, cell proliferation \cite{Sullivan2015a}, cell signaling \cite{Zong_Rabinowitz_White_2016},and in the regulation of innate and adaptive immunity \cite{Weinberg_Sena_Chandel_2015}(Weinberg. 2015). It follows that a decline in mitochondrial function, reflected by a diminished electron transport chain activity, is implicated in many human diseases that range from common disorders such as cancer \cite{Wallace_2012,Zong_Rabinowitz_White_2016} 6), diabetes \cite{Dunham-Snary_Sandel_Westbrook_Ballinger_2014}, Dunham-Snaryet al., 2014), neurodegeneration \cite{Yao_Irwin_Zhao_Nilsen_Hamilton_Brinton_2009} and aging \cite{Sun_Youle_Finkel_2016}, to rare genetic disorders \cite{Titov_Cracan_Goodman_Peng_Grabarek_Mootha_2016}. One of the most informative assessments of mitochondrial function is the quantification of cellular respiration, as it directly reflects electron transport chain dysfunction \cite{Titov_Cracan_Goodman_Peng_Grabarek_Mootha_2016} and depends on many sequential reactions from glycolysis to oxidative phosphorylation \cite{Koopman_Michels_Dancy_Kamble_Mouchiroud_Auwerx_Nollen_Houtkooper_2016a}. Estimations of oxygen consumption rates (OCR) expressed in pmol/min, which are mainly driven by mitochondrial respiration through oxidative phosphorylation, and extracellular acidification rates (ECAR) expressed in mpH/min, which reflect glycolysis \cite{Divakaruni_Paradyse_Ferrick_Murphy_Jastroch_2014,Ferrick_Neilson_Beeson_2008,Koopman_Michels_Dancy_Kamble_Mouchiroud_Auwerx_Nollen_Houtkooper_2016a}, are more conclusive for the ability to synthesize ATP and mitochondrial function than measurements of intermediates (such as ATP or NADH) and potentials \cite{Brand_Nicholls_2011,Dmitriev_Papkovsky_2012}.OCR was classically measured using a Clark-type electrode, which required a substantial amount of purified mitochondria, was time consuming, and did not allow automated injection of compounds \cite{Wu2007}. Also, experimentation with isolated mitochondria is ineffective because cellular regulation of mitochondrial function is removed during isolation \cite{Hill_Benavides_Jr_Ballinger_Italia_2012}. In the last few years, a new technology using fluorescent oxygen sensors \cite{Gerencser_Neilson_Choi_Edman_Yadava_Oh_Ferrick_Nicholls_Brand_2009} in a microplate assay format has been developed by the company Seahorse Bioscience (now part of Agilent Technologies) \cite{Ribeiro_Giménez-cassina_Danial_2015}. It allows simultaneous, real-time measurements of both OCR and ECAR in multiple cell lines and conditions, reducing the sample material required and increasing the throughput \cite{Divakaruni_Paradyse_Ferrick_Murphy_Jastroch_2014,Ribeiro_Giménez-cassina_Danial_2015}. However, high throughput assays demand for robust statistical methods to combine their results.Typically, OCR and ECAR levels are measured using the Seahorse XF Analyzer in 96 (or 24) well-plates at multiple time steps under three consecutive treatments, in a procedure known as mitochondrial stress test \cite{SeahorseBioscience}. These conditions are obtained by systematically injecting Oligomycin, FCCP and Rotenone (Fig. 1B). Under basal conditions, complexes I-IV exploit energy derived from electron transport to pump protons across the inner mitochondrial membrane. The thereby generated proton gradient is subsequently harnessed by complex V to generate ATP. Blockage of the proton translocation through complex V by Oligomycin represses ATP production and prevents the electron transport throughout complexes I-IV due to the unexploited gradient. Administration of FCCP, an ionophor, subsequently dissipates the gradient uncoupling electron transport from complex V activity and increasing oxygen consumption. Finally, mitochondrial respiration is completely halted using the complex I inhibitor Rotenone whereby the residual non-mitochondrial respiration can be assessed. This approach is label-free and non-destructive, so the cells can be retained and used for further assays \cite{Ferrick_Neilson_Beeson_2008}. The assay enables the estimation of six different bioenergetic measures: basal respiration, proton leak, non-mitochondrial respiration, ATP production, spare respiratory capacity, and maximal respiration, which can be deduced from differences of OCRs between conditions \cite{Brand_Nicholls_2011,Divakaruni_Paradyse_Ferrick_Murphy_Jastroch_2014}. Increase in proton leak and decrease in maximum respiratory capacity are indicators of mitochondrial dysfunction \cite{Brand_Nicholls_2011}. ATP production, basal respiration, and spare capacity alter in response to ATP demand, which is not necessarily mitochondrion-related as it may be the consequence of deregulation of any cellular process altering general cellular energy demand. Analogously, different treatments can be injected to modify ECAR levels and obtain key parameters of glycolytic function: glycolysis, glycolytic capacity, glycolytic reserve and non-glycolytic acidification (Fig. S1), in an assay known as glycolysis stress test \cite{SeahorseBioscience2012}.Current literature describing the Seahorse technology addressed experimental aspects regarding sample preparation \cite{Dranka_Hill_Darley-usmar_2010,Zhang_Nuebel_Wisidagama_Setoguchi_Hong_2013}, the amount of cells to seed \cite{Zhang_Nuebel_Wisidagama_Setoguchi_Hong_2013,Zhou2012}, and compound concentration in different organisms \cite{Dranka2013,Koopman_Michels_Dancy_Kamble_Mouchiroud_Auwerx_Nollen_Houtkooper_2016a,Shah-Simpson_Pereira_Dumoulin_Caradonna_Burleigh_2016}. However, studies suggesting statistical best practices for Seahorse flux analyzer data are lacking. The goal of our study was to precisely provide such best practices by analyzing a large-scale dataset of 126 mitochondrial stress experiments (in 96-well plate format) involving 206 different fibroblast cell lines. We obtained robust and accurate oxygen consumption rates for each well, which translates into robust summarized values of the multiple replicates inside one plate and across experiments. Our statistical procedure includes normalization of raw data and outlier identification and led to an increase in the accuracy over state-of-the-art methods. Then, we focused on how to statistically compare the results of 2 different samples. First, we noticed that the variation across experiments is dominant with respect to the variation inside, suggesting that it is necessary.ResultsExperimental design and raw dataWe derived OCR, ECAR, and cell number for 205 dermal fibroblast cultures derived from patients suffering from rare mitochondrial diseases, and control cells from healthy donors (normal human dermal fibroblast - NHDF, Methods). These were assayed in 126 experiments (or plates), all using the same protocol (\ref{625784}). We grew 28 cell lines multiple times and placed them in more than one plate, and we will refer to these replicates as different samples. The NHDF cell line was seeded in all experiments for batch correction. All four corners of each plate were left as blank, i.e. filled with media but no cells to control for changes in temperature \cite{Dranka_Hill_Darley-usmar_2010}. The typical layout of a plate is depicted in Fig. 1C, showing how each sample is present in many well replicates. We seeded between 3 and 7 samples per experiment (median = 4). This variation reflects typical set-ups of experiments in a lab gathered over multiple months or years.We used the standard cellular respiration treatments (Fig. 1A) leading to four intervals with three time points each and denoted by Int1 (resting cells), Int2 (after Oligomycin), Int3 (after FCCP) and Int4 (after Rotenone). Wells for which the median OCR level did not follow the expected trend, namely, median(OCR(Int3)) > median(OCR(Int1)) > median(OCR(Int2)) > median(OCR(Int4)), were discarded (977 wells, 10.47%). We also excluded from the analysis wells in which the cells got detached and contaminated wells (461 wells, 4.94%, \ref{625784}).Random and systematic variations between well replicatesTypical replicate time series are shown in Fig. 1C, with data from 8 wells for a single sample in a single plate. It shows the typical kinds of variations that we observed.First, outlier data points occurred frequently. We distinguished two different types of outliers: entire series for a well (e.g., well G5 in Fig. 1C) and individual data points (e.g., well B6 at time point 6 in Fig. 1C). In the latter case, eliminating the entire series for well B6 would be too restrictive, and would result in losing valuable data from the other 11 valid time points. Therefore, we propose a method to find outliers considering these two possibilities independently.Second, we noticed that the higher the OCR value, the higher the variance between replicates, suggesting that the error is multiplicative. Unequal variance, or heteroscedasticity, can strongly affect the validity of statistical tests and the robustness of estimations. We therefore considered modeling OCR on a logarithmic scale, where the dependency between variance and mean disappears (Figs. 2A, 2B).Third, we observed systematic biases in OCR between wells (e.g., OCR values of well C6 are among the highest while OCR values of well B5 are among the lowest at all time points). Variations in cell number, initial conditions, treatment concentrations, and fluorophore sleeve calibration can lead to systematic differences between wells, which we refer to as well biases. To investigate whether well biases could be corrected using cell number as suggested in \cite{Dranka_Hill_Darley-usmar_2010}, we counted the number of cells after the experiments using Cyquant (\ref{625784}). As expected, median OCR for each interval grows linearly with cell number measured at the end of the experiment (Spearman rho between 0.32 and 0.47, P < 2.2e-16, Fig. S2A). However, the relation is not perfect reflecting important additional sources of variations but also possible noise in measuring cell number. Strikingly, dividing OCR by cell count led to a higher coefficient of variation (standard deviation divided by the mean) between replicate wells than without that correction (Fig. S2B). This analysis showed that normalization for cell number should not be done simply by a blunt division by raw cell counts and motivated us to derive another method to capture well biases.A statistical model of OCRFor a given biological sample in one plate, we modeled the logarithm of OCR of well w at time point t as a sum of well bias, interval effects and noise, i.e.,: