Authorea

Timothy O'Donnell Deleted File almost 8 years ago

Commit id: d4708bfea335e6b9cfc57a162244010bc7ec7799

deletions | additions

\section{Evaluating the performance of a binding predictor} Two datasets were used from a recent paper studying the relationship between training data and pMHC predictor accuracy\cite{Kim_2014}. The training dataset (BD2009) contained entries from IEDB\cite{Salimi_2012} up to 2009 and the test dataset (BLIND) contained IEDB entries from between 2010 and 2013 which did not overlap with BD2009 (Table~\ref{tab:datasets}). \begin{table}[h!] \centering \begin{tabular}{l||cccc} \toprule {} & Alleles & IC50 Measurements & Expanded 9mers \\ \midrule BD2009 & 106 & 137,654 & 470,170 \\ BLIND & 53 & 27,680 & 83,752 \\ \bottomrule \end{tabular} \caption{Train (BD2009) and test (BLIND) dataset sizes.} \label{tab:datasets} \end{table} Throughout this paper we will evaluate a pMHC binding predictor using three different metrics: \begin{itemize} \item {\bf F$_1$ score}: Measures trade-off between sensitivity and specificity for predicting ``strong binders'' with affinities $<= 500$nM. \item {\bf AUC}: Area under the ROC curve. Estimates the probability that a ``strong binder'' peptide will be given a stronger predicted affinity than one whose ground truth affinity is $>500$nM. \item {\bf Kendall's $\tau$}: Rank correlation across the full spectrum of binding affinities. \end{itemize}