Objective 1 - HCI variant dictionary analysis
For over two decades, HCI has maintained a variant dictionary for
tracking variants detected through research or clinical genetic
analysis. Each clinical variant is assigned one classification.
Generally, this is the original classification assigned by the clinical
lab that performs the testing. Detected variants whose classifications
are in conflict with ClinVar are reviewed by a team of genetic
counselors, physicians, and variant specialists who decide upon the
final classification to be stored with the variant in the dictionary.
These classifications may be revisited by the variant review team if a
clinical lab sends an update indicating a variant has been reclassified
based on their classification criteria, or if a variant already in the
dictionary is identified in a new patient and the clinical lab has
assigned a different classification.
Snapshots of the variant dictionary were pulled at three time points
that span two years (2019-03-19, 2020-01-29, and 2021-03-02) and used to
gauge how the dictionary changed over the two years. While there were
several fields included for each entry in the variant dictionary, the
coding DNA HGVS expression and the variant interpretation fields were
the only ones used for this study. The following metrics were the focus
of the analysis:
- The number of entries added to the variant dictionary between each
time point.
- The number of added entries that have identical HGVS expressions to
existing entries (duplicates).
- The number of entries whose “interpretation” was changed between
each time point.
The variant dictionary includes unique identifiers for each record and
these identifiers were used to compare HGVS expressions and
interpretations across the three snapshots. ClinVar was used to
establish a point of reference for the rate of HGVS expression and
interpretation changes for variants in the variant dictionary. As part
of the ClinVar tab_delimited archive, there are monthly releases of avariant_summary_YYYY-MM.txt.gz file16. This
file contains several fields. The “AlleleID” (identifier assigned by
ClinVar to each simple allele), “Name” (contains the coding DNA HGVS
expression), and “ClinicalSignificance” (clinical interpretation of
the variant) fields were the only ones used for this study. Three
ClinVar variant summary files were downloaded
(variant_summary_2019-03.txt, variant_summary_2020-01.txt,
variant_summary_2021-03.txt) that corresponded to the dates of the
three annual snapshots. These files were parsed and the coding DNA HGVS
expressions were compared to those in the variant dictionary. If a match
was found, the AlleleID was mapped to the variant’s unique identifier in
the variant dictionary. These associations were then tracked across the
three variant dictionary snapshots and three ClinVar variant summary
files (spanning two years) to determine the rate by which ClinVar
updated the same HGVS expressions as those stored by HCI in the variant
dictionary. To determine changes in clinical interpretation, the
“ClinicalSignificance” field of the variant summary file was compared
to the interpretation assigned by the HCI variant review team.
Generally, the ClinVar variant summary file assigns a single
classification for each alleleID. This means that although multiple
conflicting interpretations are common in ClinVar, the variant summary
file usually presents a single authoritative interpretation for each
alleleID. The exception to this is in the most recent variant summary
file used (variant_summary_2021-03.txt) where 32 alleleIDs with
conflicting interpretations were found. However, this had no effect on
our analysis since the HGVS expressions for those 32 alleleIDs are not
present in the HCI variant dictionary.