Objective 1 - HCI variant dictionary analysis
The HCI variant dictionary was analyzed in order to inform the design
process of LocalVar. Figure 3 shows that a small percentage of the total
variants (1.2% in 2019, 1.1% in 2020, and 1.1% in 2021) were
duplicate entries. These findings show that even with high-quality data,
there can be a need for tooling to detect the small percentage of
duplicates in variant collections. Of the variants in each snapshot of
the variant dictionary, 37.8% in 2019, 35.4% in 2020, and 35.4% in
2021 were also found in the ClinVar variant summary files. These lower
percentages are due to the fact that affiliate labs of HCI often do not
publicly release new variants to ClinVar. Of those that are also found
in ClinVar, a few had interpretation conflicts (6.5% in 2019, 5.7% in
2020, 4.6% in 2021). These conflicts were unchanged across the three
snapshots. A majority of these conflicts (94%) were not clinically
significant (“Benign/Likely benign” vs “Uncertain significance”). A
small percentage (5.3%) could be clinically significant
(“Pathogenic/Likely pathogenic” vs “Uncertain significance”). Only
one (0.2%) of these conflicts was clinically significant
(“Pathogenic” vs “Benign”). The severity of each conflict type
(clinically significant, could be clinically significant, or clinically
significant) is drawn from the ClinVar Miner study where all conflicts
in ClinVar are categorized and analyzed18. While
ClinVar is a widely-used tool containing informative variant
interpretations, HCI does not consider such public knowledge as
authoritative. However, the ability to detect and track these conflicts
can assist variant review teams (such as the one at HCI), by providing a
synthesis of published data via ClinVar that can help to inform their
decision.
Figure 4 shows that there were very few changes to the HGVS expressions
for the variants in the variant dictionary over the two-year recording
period. From 2019–2020, there were 11 total HGVS expression changes in
the variant dictionary. This is compared to 700 ClinVar changes to the
HGVS expressions of variants found in the variant dictionary. Upon
closer inspection, it was found that 695 of those ClinVar changes
(99.3%) were transcript updates. From 2020–2021, the number of HGVS
expression changes within the variant dictionary rose to 190, but, as
was the case with ClinVar, 185 of those changes (97.4%) were transcript
updates. ClinVar reported 505 HGVS expression changes over that same
period and 100% of them were transcript updates. These findings
highlighted the fact that transcript changes are common and may place a
burden on individuals tasked with keeping variant collections
up-to-date. They also showed that asynchronous updates from external
sources, such as ClinVar, can provide useful synonym detection and
automated upkeep of variant records.
Figure 4 also suggests that there were clinical interpretation changes
in ClinVar (192 from 2019–2020, 244 from 2020–2021) that were not
reflected in the HCI variant dictionary (five from 2019–2020, 40 from
2020–2021). There is wisdom in being prudent with updating changes to
clinical interpretations based solely on ClinVar. A 2020 study by Xiang,
et al. tracked variants interpreted as “Pathogenic” and “Likely
pathogenic” by ClinVar. They found that after manual interpretation of
326 qualifying variants, 40% were downgraded to benign, likely benign,
or variant of uncertain significance while only 2% were found more
likely to be risk factors19. It would therefore be
alarming to not find a high rate of interpretation conflicts when
comparing a variant dictionary to ClinVar. However, letting users know
that a change occurred, giving them access to evidence and supporting
material, and giving them the option to easily update their local
variant interpretation can be a useful feature in a variant collection
managing tool. A summary of the tooling needs discussed above that were
drawn from the analysis of the HCI variant dictionary is included in
Table 1.