Timothy O'Donnell Deleted File  almost 8 years ago

Commit id: d4708bfea335e6b9cfc57a162244010bc7ec7799

deletions | additions      

         

\section{Evaluating the performance of a binding predictor}  Two datasets were used from a recent paper studying the relationship between training data and pMHC predictor accuracy\cite{Kim_2014}. The training dataset (BD2009) contained entries from IEDB\cite{Salimi_2012} up to 2009 and the test dataset (BLIND) contained IEDB entries from between 2010 and 2013 which did not overlap with BD2009 (Table~\ref{tab:datasets}).  \begin{table}[h!]  \centering  \begin{tabular}{l||cccc}  \toprule  {} & Alleles & IC50 Measurements & Expanded 9mers \\  \midrule  BD2009 & 106 & 137,654 & 470,170 \\  BLIND & 53 & 27,680 & 83,752 \\  \bottomrule  \end{tabular}  \caption{Train (BD2009) and test (BLIND) dataset sizes.}  \label{tab:datasets}  \end{table}  Throughout this paper we will evaluate a pMHC binding predictor using three different metrics:  \begin{itemize}  \item {\bf F$_1$ score}: Measures trade-off between sensitivity and specificity for predicting ``strong binders'' with affinities $<= 500$nM.   \item {\bf AUC}: Area under the ROC curve. Estimates the probability that a ``strong binder'' peptide will be given a stronger predicted affinity than one whose ground truth affinity is $>500$nM.   \item {\bf Kendall's $\tau$}: Rank correlation across the full spectrum of binding affinities.  \end{itemize}