Thomas Lin Pedersen edited Materials and methods.tex  over 9 years ago

Commit id: 7e7ec06445c9e5364379fb2495cd5a68c577fe04

deletions | additions      

       

\section{Materials and Methods}  \subsection{Data}  The data used in this study has been provided by \citet{24494671} and match that used in their paper. It is a collection of metrics extracted using QuaMeter from samples across a range of US laboratories (Vanderbilt University Medical Center, Pacific Northwest National Laboratory, Broad Institute and John Hopkins University).  As example data the Velos data from Vanderbilt University Medical Center was used as it constituted the most samples over a long period of time. All samples were divided into runs by looking at the time difference between the sample and the next. A time difference exceeding 2 hours constituted the start of a new run. Using this approach 37 runs were identified in the dataset with a median size of 15 samples. Two of the runs only included one sample and were subsequently removed. In each run the first, middle and last sample were assigned to be standards used to monitor between run variation. A stable instrument period between Feb. 25 February 25\textsuperscipt{th}  and April 15 15\textsuperscipt{th}  2013 were identified and the samples from that period was used as a training set for between run variation. The training set thus included 89 samples. For within run analysis run 6 was chosen (August 31th 31\textsuperscipt{th}  2012 ff) as it constituted 15 samples including one obvious and a few subtle outlier samples. \subsection{Data analysis}  All analyses have been done in the statistical computing environment R \citep{R} using additional packages that will get referenced accordingly when described. The code used for performing the analyses is available in the appendix. The one exception is for the calculation of Angle Based Outlier Detection which was done using ELKI \citep{ELKI} as this software contained the only known implementation.