Neoantigen Identification Challenge

Challenge Goals

Recognition of tumor neoantigens is a major factor in the activity of clinical immunotherapies (Schumacher 2015). Therapeutic vaccines targeting neoantigens (Hacohen 2013) are showing promising results in clinical trials (Carreno 2015).

There are a few published pipelines for neoantigen identification (Hundal 2016), with accompanying open source software.

We'd like to improve the accuracy of neoantigen identification by comparing the performance and identifying the most informative stages of each pipeline.

Overview of the pipeline pVAC-Seq: This figure illustrates the methodological framework behind the pVAC-Seq pipeline. Starting with preparation of inputs, it consists of three main steps - epitope prediction, integration of sequencing information, and filtered candidate selection (Hundal 2016)

Challenge Design

We'll be collecting blood, somatic tissue, and tumor-infiltrating lymphocytes (TILs) from patients with a few different types of solid tumors. Whole-exome sequencing (WES) will be performed on the blood and somatic tissue and RNA-Seq will be performed on the somatic tissue.

Input data

  • Germline WES DNA (FASTQ)
  • Somatic WES DNA (FASTQ)
  • Somatic RNA (FASTQ)

Data to submit

  • VCF with predicted somatic mutations
  • CSV with predicted HLA class I types
  • CSVs with the ranked list of predicted neoepitopes
    • 1 CSV for each stage of the neoepitope prediction pipeline, with a descriptive label for the action performed by each stage (e.g. "expression filter", "trunk mutation")


A consolidated list of predicted neoepitopes will be built from all challenge submissions and a subset of those peptides will be synthesized and screened against the TILs from the patients using a proprietary assay. The final scoring algorithm has not been determined but will likely be related to the Normalized Discounted Cumulative Gain (NDCG) or F1 score. The results of the assay will be provided to all challenge participants.

Note that the primary goal of this challenge is to learn what pipeline stages are most informative for neoepitope identification, however, so in addition to scoring the final neoepitope lists we'll also be looking to see at which stage correct neoepitopes were filtered out.