Predicting Peptide-MHC Binding Affinities With Imputed Training Data

This repository has the data, analysis notebooks, and Authorea-generated latex files for Predicting Peptide-MHC Binding Affinities With Imputed Training Data, submitted to the ICML 2016 Workshop on Computational Biology.

Data and notebooks

The predictions on the blind test data generated by the MHCflurry predictors, netMHC, netMHCpan, and SMM are available in data/validation_predictions_full.csv. This file has predictions from 64 MHCflurry models, 32 with imputation and 32 without. Descriptions of the models are in data/validation_models.csv.

The notebook to train the predictors and generate these results took about 20 hours to run on a single TITAN X GPU and is in notebooks/validation.ipynb.

The analysis of these results, including generating ensemble predictions from the individual predictors and calculating AUC, F1, and tau scores is in notebooks/validation results analysis.ipynb.

The command to generate the data for Figure 1 was:

mhcflurry-dataset-size-sensitivity.py \
	--allele HLA-A0201  \
	--training-csv data/bdata.2009.mhci.public.1.txt \
	--imputation-method mice \
	--number-dataset-sizes 15 \
	--random-negative-samples 0 \
	--min-observations-per-peptide 3 \
	--training-epochs 250 \
	--repeat 3 \
	--max-training-samples 500 \
	--min-training-samples 10 \
	--dropout 0.5 \
	--hidden-layer-size 64 \
	--embedding-size 32

Versions

We used MHCflurry revision 52a88ace.

Other libraries:

appdirs==1.4.0
backports-abc==0.4
backports.shutil-get-terminal-size==1.0.0
backports.ssl-match-hostname==3.5.0.1
biopython==1.66
bottle==0.12.9
certifi==2016.2.28
CherryPy==5.1.0
climate==0.4.6
configparser==3.3.0.post2
CVXcanon==0.0.23.4
cvxopt==1.1.8
cvxpy==0.4.0
cycler==0.10.0
datacache==0.4.17
decorator==4.0.9
dill==0.2.5
downhill==0.3.2
ecos==2.0.4
entrypoints==0.2.1
-e [email protected]:hammerlab/fancyimpute.git@c4510c5a77fcf27af65149610f260f18826129a4#egg=fancyimpute
functools32==3.2.3.post2
h5py==2.6.0
ipykernel==4.3.1
ipython==4.2.0
ipython-genutils==0.1.0
ipywidgets==5.1.2
Jinja2==2.8
jsonschema==2.5.1
jupyter-client==4.2.2
jupyter-core==4.1.0
Keras==1.0.2
lxml==3.6.0
MarkupSafe==0.23
matplotlib==1.5.1
mistune==0.7.2
multiprocess==0.70.4
nbconvert==4.2.0
nbformat==4.0.1
notebook==4.2.0
numpy==1.10.4
pandas==0.18.0
pathlib2==2.1.0
-e [email protected]:hammerlab/pepdata.git@a76e9606a24ff0d1b4c817182cdd06d5c75ba169#egg=pepdata
pexpect==4.0.1
pickleshare==0.7.2
plac==0.9.1
progressbar33==2.4
ptyprocess==0.5.1
pycairo==1.10.0
Pygments==2.1.3
pyparsing==2.1.1
python-dateutil==2.5.2
pytz==2016.3
PyYAML==3.11
pyzmq==15.2.0
requests==2.10.0
scikit-learn==0.17.1
scipy==0.17.0
scs==1.2.6
seaborn==0.7.0
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.10.0
terminado==0.6
Theano==0.9.0.dev0
toolz==0.7.4
tornado==4.3
traitlets==4.2.1
typechecks==0.0.2
widgetsnbextension==1.2.1