Development of the Comprehensive Genome Report
Comprehensive genome reports were modelled off of clinical reports from
the Sinai Health System Laboratory (Toronto, ON), and reflect the
current recommendations for reporting of clinical GS results (Miller et
al., 2021; Green et al., 2013). In addition to monogenic disease risks
and carrier status, reports include expanded information related to
pharmacogenomics, PRS, genetic ancestry, as well as blood, human
leukocyte antigen (HLA), and viral lineage.
Reporting of pharmacogenomics variants is based on the Pharmcogenomics
Knowledge Base (PharmGKB) (Hewett et al., 2002) and includes annotations
of variant-drug interactions, clinical implications, and dosing
recommendations. The PharmGKB output is appended to star alleles called
from Stargazer v1.0.8 using participant GS data (Lee et al., 2019). The
genotypes of 17 pharmacogenes were identified, including structural
variant analysis in CYP2D6. Additionally, the genotypes of two HLA loci,
HLA-A and HLA-B, as well as rs12777823 were included. Custom Python
scripts were used for assigning appropriate PharmGKB recommendations for
each pharmacogene based on genotyping results. Genotypes that did not
meet the Clinical Pharmacogenetics Implementation Consortium (CPIC)
(Relling et al., 2011) and/or PharmGKB criteria for Level A and Level 1A
evidence for a given gene-medication association were excluded from the
final report.
Reporting of PRS results is based on previously outlined patient
experiences and preferences (Brockman et al., 2021), as well as
previously validated, ancestry-adjusted PRS assays for six common
conditions (type 2 diabetes, coronary artery disease, atrial
fibrillation, as well as breast [female only], prostate [male
only], and colon cancer) (Vassy et al., Preprint. DOI:
10.21203/rs.3.rs-743779/v1). Raw scores were adjusted for ethnicity
using ancestry-informative principal components to calculate an adjusted
PRS (Vassy et al., Preprint. DOI: 10.21203/rs.3.rs-743779/v1).
Reads for 22 HLA loci, including HLA-A, -B, -C. -DQA1, -DQB1, -DPB1,
-DRB1 were extracted and aligned to the HLA v2 database from the
HLA-VBSeq package, which was used to estimate the most probable
HLA genotype from GS data to a four digit resolution (Mimori et al.,
2019). HLA alleles associated with increased autoimmune disease risk
were identified through a review of literature available through
PubMed®. The final list included seven autoimmune diseases with
significant HLA associations (type 1 diabetes, celiac disease,
rheumatoid arthritis, ankylosing spondylitis, Behcet’s disease, multiple
sclerosis, and Graves’ disease). An HLA-disease association database was
developed with specific HLA haplotypes/genotypes, literature references,
risk and/or odds ratios (OR), p -values, and interpretation of
findings (Supplementary file 4). HLA calls from HLA-VBSeq were assigned
to the appropriate autoimmune disease association using custom Python
scripts.
ADMIXTURE software (Alexander et al., 2009) was used to estimate patient
ancestry against the HapMap3 dataset, which contains 1.6 million common
single nucleotide polymorphisms (SNPs) in 1184 reference individuals
from 11 populations (Altshuler et al., 2010). Bash scripting and PLINK
software were used to remove duplicate SNPs as well as chromosome and
position mismatches. Variants used for ancestry estimation were found to
be in linkage disequilibrium (LD) with an r2 > 0.2 in a
50kb window. Variants were excluded if they fell within genomic ranges
of known high-LD structure. In-house scripts were used to generate a
visual summary of the output.
Red blood cell (RBC) and platelet antigens were predicted using
participant GS data, described elsewhere (Lane et al., 2016). Predicted
ABO and Rh blood types, rare RBC and/or platelet antigens, in addition
to implications for blood donation and transfusion were presented
following a framework for reporting of GS results for a generally
healthy individual (Vassy et al., 2015).
SARS-CoV-2 viral lineage was determined by viral GS at the Ontario
Institute of Cancer Research. Viral lineage follows standardized
Phylogenetic Assignment of Named Global Outbreak (Pango) lineages and
World Health Organization (WHO) nomenclature. Raw, de-identified viral
and GS data were shared with open and controlled access databases with
participants’ consent as outlined in Table 3. Finally, all reporting
elements were compiled automatically into a final report using scripts
developed in-house. Multiple quality control steps were taken to assure
that participant data was correctly compiled according to study ID(s).
RESULTS
A complete outline of the final workflow for pre-test counseling and
return of results is summarized in Figure 1 and described in detail
below.