this is for holding javascript data
Timothy O'Donnell edited sectionContent_Text_.tex
almost 8 years ago
Commit id: d68e5676a05e385838dd7d9b420ca83015ecbff4
deletions | additions
diff --git a/sectionContent_Text_.tex b/sectionContent_Text_.tex
index 9e342d3..b580d84 100644
--- a/sectionContent_Text_.tex
+++ b/sectionContent_Text_.tex
...
\section{Introduction}
In organisms with
an adaptive
immune system, immunity, cells
of all tissues expose on their surfaces
small fragments of proteins
synthesized by extracted from the
cell. cytosol. These
fragments peptide fragments, typically but not always 9 amino acids in length, are monitored by T cells, which
recognize and kill infected or cancerous cells
after detecting based on their viral, bacterial, mutant, or unusual
protein fragments~\cite{Anderson_2004}. In order to peptides~\cite{Anderson_2004}. To be
presented, presented the
fragments, known as peptides, peptides must bind to a major histocompatability complex I (MHC I) protein, which
acts as forms a platform for interaction with T cells.
There are many thousands Peptide-MHC affinity prediction is the well-studied problem of
MHC alleles in predicting the
human population, each with a distinct binding
preference for peptides. Though there are many steps in ``antigen processing''~\cite{Cresswell_2005}, the process by which protein fragments find themselves loaded onto membrane-bound MHC, it has become apparent that strength of a given peptide and MHC
binding I pair\cite{Lundegaard_2007}. Initial approaches It is
a well studied topic t is to predict the
most restrictive step and consequently the most important sub-problem binding affinity of
predicting T-cell epitopes. a peptide and an MHC I allele
% (original): In most vertebrates, cytotoxic T-cells enforce multi-cellular order by killing both infected and cancerous cells. Each individual organism possesses a poly-clonal army of T-cells which collectively are able to distinguish rare unhealthy cells from healthy ones. This amazing feat is achieved through the winnowing and expansion of clonal T-cell populations possessing highly specific T-cell receptors (TCRs)~\cite{Blackman_1990}. Each distinct TCR is able to recognize a small number of similar peptides bound to an MHC molecule on the surface of a cell~\cite{Huseby_2005}. Though there are many steps in ``antigen processing''~\cite{Cresswell_2005} (the process by which protein fragments find themselves loaded onto membrane-bound MHCs), it has become apparent that MHC binding is the most restrictive step and consequently the most important sub-problem of predicting T-cell epitopes.
Initial approaches to predicting MHC ligands focused on ``sequence motifs''\cite{Sette_1989}, which were quickly replaced by a variety of regularized linear models, which themselves are consistently outperformed by regularized linear models with interaction terms such as SMM~\cite{Peters_2003}. The march toward black box non-linear models reached its local maximum with the NetMHC family of predictors, which are a collection of related models that utilize ensembles of neural networks. Two of these predictors in particular, NetMHC~\cite{Lundegaard_2008} and NetMHCpan~\cite{Nielsen_2007}, have emerged as the preferred methods for computational prediction of MHC ligands across several areas of immunology, including virology~\cite{Lund_2011}, tumor immunology~\cite{Gubin_2015}, and autoimmunity~\cite{Abreu_2012}.
% Though there are many steps in ``antigen processing''~\cite{Cresswell_2005}, it has become apparent that MHC binding is the most restrictive step and consequently the most important sub-problem of predicting T-cell epitopes.
% (original): In most vertebrates, cytotoxic T-cells enforce multi-cellular order by killing both infected and cancerous cells. Each individual organism possesses a poly-clonal army of T-cells which collectively are able to distinguish rare unhealthy cells from healthy ones. This amazing feat is achieved through the winnowing and expansion of clonal T-cell populations possessing highly specific T-cell receptors (TCRs)~\cite{Blackman_1990}. Each distinct TCR is able to recognize a small number of similar peptides bound to an MHC molecule on the surface of a cell~\cite{Huseby_2005}. Though there are many steps in ``antigen processing''~\cite{Cresswell_2005} (the process by which protein fragments find themselves loaded onto membrane-bound MHCs), it has become apparent that MHC binding is the most restrictive step and consequently the most important sub-problem of predicting T-cell epitopes.
The primary difference between NetMHC and NetMHCpan is that the former is an {\it allele-specific} method which trains a separate predictor for each allele's binding dataset, whereas the latter is a {\it pan-allele} method whose inputs are vector encodings of both the peptide and a subset of MHC molecule's primary sequence. The conventional wisdom is that NetMHC performs better on alleles with many assayed ligands whereas NetMHCpan is superior for less well-characterized alleles~\cite{Gfeller_2016}.
In this paper we explore the space between {\it allele-specific} and {\it pan-allele} prediction by imputing the unobserved values of peptide-MHC affinities for which we have no measurements and using these imputed values for pre-training of allele-specific binding predictors.