Objective 1 - HCI variant dictionary analysis
The HCI variant dictionary was analyzed in order to inform the design process of LocalVar. Figure 3 shows that a small percentage of the total variants (1.2% in 2019, 1.1% in 2020, and 1.1% in 2021) were duplicate entries. These findings show that even with high-quality data, there can be a need for tooling to detect the small percentage of duplicates in variant collections. Of the variants in each snapshot of the variant dictionary, 37.8% in 2019, 35.4% in 2020, and 35.4% in 2021 were also found in the ClinVar variant summary files. These lower percentages are due to the fact that affiliate labs of HCI often do not publicly release new variants to ClinVar. Of those that are also found in ClinVar, a few had interpretation conflicts (6.5% in 2019, 5.7% in 2020, 4.6% in 2021). These conflicts were unchanged across the three snapshots. A majority of these conflicts (94%) were not clinically significant (“Benign/Likely benign” vs “Uncertain significance”). A small percentage (5.3%) could be clinically significant (“Pathogenic/Likely pathogenic” vs “Uncertain significance”). Only one (0.2%) of these conflicts was clinically significant (“Pathogenic” vs “Benign”). The severity of each conflict type (clinically significant, could be clinically significant, or clinically significant) is drawn from the ClinVar Miner study where all conflicts in ClinVar are categorized and analyzed18. While ClinVar is a widely-used tool containing informative variant interpretations, HCI does not consider such public knowledge as authoritative. However, the ability to detect and track these conflicts can assist variant review teams (such as the one at HCI), by providing a synthesis of published data via ClinVar that can help to inform their decision.
Figure 4 shows that there were very few changes to the HGVS expressions for the variants in the variant dictionary over the two-year recording period. From 2019–2020, there were 11 total HGVS expression changes in the variant dictionary. This is compared to 700 ClinVar changes to the HGVS expressions of variants found in the variant dictionary. Upon closer inspection, it was found that 695 of those ClinVar changes (99.3%) were transcript updates. From 2020–2021, the number of HGVS expression changes within the variant dictionary rose to 190, but, as was the case with ClinVar, 185 of those changes (97.4%) were transcript updates. ClinVar reported 505 HGVS expression changes over that same period and 100% of them were transcript updates. These findings highlighted the fact that transcript changes are common and may place a burden on individuals tasked with keeping variant collections up-to-date. They also showed that asynchronous updates from external sources, such as ClinVar, can provide useful synonym detection and automated upkeep of variant records.
Figure 4 also suggests that there were clinical interpretation changes in ClinVar (192 from 2019–2020, 244 from 2020–2021) that were not reflected in the HCI variant dictionary (five from 2019–2020, 40 from 2020–2021). There is wisdom in being prudent with updating changes to clinical interpretations based solely on ClinVar. A 2020 study by Xiang, et al. tracked variants interpreted as “Pathogenic” and “Likely pathogenic” by ClinVar. They found that after manual interpretation of 326 qualifying variants, 40% were downgraded to benign, likely benign, or variant of uncertain significance while only 2% were found more likely to be risk factors19. It would therefore be alarming to not find a high rate of interpretation conflicts when comparing a variant dictionary to ClinVar. However, letting users know that a change occurred, giving them access to evidence and supporting material, and giving them the option to easily update their local variant interpretation can be a useful feature in a variant collection managing tool. A summary of the tooling needs discussed above that were drawn from the analysis of the HCI variant dictionary is included in Table 1.