Development of the Comprehensive Genome Report
Comprehensive genome reports were modelled off of clinical reports from the Sinai Health System Laboratory (Toronto, ON), and reflect the current recommendations for reporting of clinical GS results (Miller et al., 2021; Green et al., 2013). In addition to monogenic disease risks and carrier status, reports include expanded information related to pharmacogenomics, PRS, genetic ancestry, as well as blood, human leukocyte antigen (HLA), and viral lineage.
Reporting of pharmacogenomics variants is based on the Pharmcogenomics Knowledge Base (PharmGKB) (Hewett et al., 2002) and includes annotations of variant-drug interactions, clinical implications, and dosing recommendations. The PharmGKB output is appended to star alleles called from Stargazer v1.0.8 using participant GS data (Lee et al., 2019). The genotypes of 17 pharmacogenes were identified, including structural variant analysis in CYP2D6. Additionally, the genotypes of two HLA loci, HLA-A and HLA-B, as well as rs12777823 were included. Custom Python scripts were used for assigning appropriate PharmGKB recommendations for each pharmacogene based on genotyping results. Genotypes that did not meet the Clinical Pharmacogenetics Implementation Consortium (CPIC) (Relling et al., 2011) and/or PharmGKB criteria for Level A and Level 1A evidence for a given gene-medication association were excluded from the final report.
Reporting of PRS results is based on previously outlined patient experiences and preferences (Brockman et al., 2021), as well as previously validated, ancestry-adjusted PRS assays for six common conditions (type 2 diabetes, coronary artery disease, atrial fibrillation, as well as breast [female only], prostate [male only], and colon cancer) (Vassy et al., Preprint. DOI: 10.21203/rs.3.rs-743779/v1). Raw scores were adjusted for ethnicity using ancestry-informative principal components to calculate an adjusted PRS (Vassy et al., Preprint. DOI: 10.21203/rs.3.rs-743779/v1).
Reads for 22 HLA loci, including HLA-A, -B, -C. -DQA1, -DQB1, -DPB1, -DRB1 were extracted and aligned to the HLA v2 database from the HLA-VBSeq package, which was used to estimate the most probable HLA genotype from GS data to a four digit resolution (Mimori et al., 2019). HLA alleles associated with increased autoimmune disease risk were identified through a review of literature available through PubMed®. The final list included seven autoimmune diseases with significant HLA associations (type 1 diabetes, celiac disease, rheumatoid arthritis, ankylosing spondylitis, Behcet’s disease, multiple sclerosis, and Graves’ disease). An HLA-disease association database was developed with specific HLA haplotypes/genotypes, literature references, risk and/or odds ratios (OR), p -values, and interpretation of findings (Supplementary file 4). HLA calls from HLA-VBSeq were assigned to the appropriate autoimmune disease association using custom Python scripts.
ADMIXTURE software (Alexander et al., 2009) was used to estimate patient ancestry against the HapMap3 dataset, which contains 1.6 million common single nucleotide polymorphisms (SNPs) in 1184 reference individuals from 11 populations (Altshuler et al., 2010). Bash scripting and PLINK software were used to remove duplicate SNPs as well as chromosome and position mismatches. Variants used for ancestry estimation were found to be in linkage disequilibrium (LD) with an r2 > 0.2 in a 50kb window. Variants were excluded if they fell within genomic ranges of known high-LD structure. In-house scripts were used to generate a visual summary of the output.
Red blood cell (RBC) and platelet antigens were predicted using participant GS data, described elsewhere (Lane et al., 2016). Predicted ABO and Rh blood types, rare RBC and/or platelet antigens, in addition to implications for blood donation and transfusion were presented following a framework for reporting of GS results for a generally healthy individual (Vassy et al., 2015).
SARS-CoV-2 viral lineage was determined by viral GS at the Ontario Institute of Cancer Research. Viral lineage follows standardized Phylogenetic Assignment of Named Global Outbreak (Pango) lineages and World Health Organization (WHO) nomenclature. Raw, de-identified viral and GS data were shared with open and controlled access databases with participants’ consent as outlined in Table 3. Finally, all reporting elements were compiled automatically into a final report using scripts developed in-house. Multiple quality control steps were taken to assure that participant data was correctly compiled according to study ID(s).
RESULTS
A complete outline of the final workflow for pre-test counseling and return of results is summarized in Figure 1 and described in detail below.