Neoantigen Identification Challenge

Challenge Goals

Recognition of tumor neoantigens is a major factor in the activity of clinical immunotherapies (Schumacher 2015). Therapeutic vaccines targeting neoantigens (Hacohen 2013) are showing promising results in clinical trials (Carreno 2015).

There are a few published pipelines for neoantigen identification (Hundal 2016), with accompanying open source software.

We'd like to improve the accuracy of neoantigen identification by comparing the performance and identifying the most informative stages of each pipeline.

Overview of the pipeline pVAC-Seq: This figure illustrates the methodological framework behind the pVAC-Seq pipeline. Starting with preparation of inputs, it consists of three main steps - epitope prediction, integration of sequencing information, and filtered candidate selection (Hundal 2016)

Challenge Design

We'll be collecting blood, somatic tissue, and tumor-infiltrating lymphocytes (TILs) from patients with a few different types of solid tumors. Whole-exome sequencing (WES) will be performed on the blood and somatic tissue and RNA-Seq will be performed on the somatic tissue.

Input data

  • Germline WES DNA (FASTQ)
  • Somatic WES DNA (FASTQ)
  • Somatic RNA (FASTQ)

Data to submit

  • VCF with predicted somatic mutations
  • CSV with predicted HLA class I types
  • CSVs with the ranked list of predicted neoepitopes
  • 1 CSV for each stage of the neoepitope prediction pipeline, with a descriptive label for the action performed by each stage (e.g. "expression filter", "trunk mutation")


A consolidated list of predicted neoepitopes will be built from all challenge submissions and a subset of those peptides will be synthesized and screened against the TILs from the patients using a proprietary assay. The final scoring algorithm has not been determined but will likely be related to the Normalized Discounted Cumulative Gain (NDCG) or F1 score. The results of the assay will be provided to all challenge participants.

Note that the primary goal of this challenge is to learn what pipeline stages are most informative for neoepitope identification, however, so in addition to scoring the final neoepitope lists we'll also be looking to see at which stage correct neoepitopes were filtered out.


At what treatment stage will the patient samples be collected?

The TILs present in an untreated tumor are different from those present in a tumor treated with checkpoint blockade or a personalized neoantigen vaccine.

What information will be provided about the patient and their tumor?

To distinguish an incorrect epitope prediction from an epitope against which no immune response was mounted, it would be helpful to know the previous pathogen exposure and microbiome of the patient, as well as the capacity of their immune system to generate an immune response. It would also be helpful to know the immune contexture of the tumor, as a more immunosuppressive tumor microenvironment (TME) makes the possibility of no immune response more likely.

Will the TILs be phenotyped?

The proprietary T cell epitope discovery assay permits the phenotyping of T cells, so: perhaps. It would be useful to know if the T cells are activated, anergic, or exhausted.

Will any intermediate validation be performed?

We may selectively validate intermediate outputs such as predicted somatic mutations, HLA type, and peptide/MHC binding strength.


  1. TN Schumacher, RD Schreiber. Neoantigens in cancer immunotherapy.. Science 348, 69-74 (2015).

  2. N Hacohen, EF Fritsch, TA Carter, ES Lander, CJ Wu. Getting personal with neoantigen-based therapeutic cancer vaccines.. Cancer Immunol Res 1, 11-5 (2013).

  3. BM Carreno, V Magrini, M Becker-Hapak, S