Objective 1 - HCI variant dictionary analysis
For over two decades, HCI has maintained a variant dictionary for tracking variants detected through research or clinical genetic analysis. Each clinical variant is assigned one classification. Generally, this is the original classification assigned by the clinical lab that performs the testing. Detected variants whose classifications are in conflict with ClinVar are reviewed by a team of genetic counselors, physicians, and variant specialists who decide upon the final classification to be stored with the variant in the dictionary. These classifications may be revisited by the variant review team if a clinical lab sends an update indicating a variant has been reclassified based on their classification criteria, or if a variant already in the dictionary is identified in a new patient and the clinical lab has assigned a different classification.
Snapshots of the variant dictionary were pulled at three time points that span two years (2019-03-19, 2020-01-29, and 2021-03-02) and used to gauge how the dictionary changed over the two years. While there were several fields included for each entry in the variant dictionary, the coding DNA HGVS expression and the variant interpretation fields were the only ones used for this study. The following metrics were the focus of the analysis:
The variant dictionary includes unique identifiers for each record and these identifiers were used to compare HGVS expressions and interpretations across the three snapshots. ClinVar was used to establish a point of reference for the rate of HGVS expression and interpretation changes for variants in the variant dictionary. As part of the ClinVar tab_delimited archive, there are monthly releases of avariant_summary_YYYY-MM.txt.gz file16. This file contains several fields. The “AlleleID” (identifier assigned by ClinVar to each simple allele), “Name” (contains the coding DNA HGVS expression), and “ClinicalSignificance” (clinical interpretation of the variant) fields were the only ones used for this study. Three ClinVar variant summary files were downloaded (variant_summary_2019-03.txt, variant_summary_2020-01.txt, variant_summary_2021-03.txt) that corresponded to the dates of the three annual snapshots. These files were parsed and the coding DNA HGVS expressions were compared to those in the variant dictionary. If a match was found, the AlleleID was mapped to the variant’s unique identifier in the variant dictionary. These associations were then tracked across the three variant dictionary snapshots and three ClinVar variant summary files (spanning two years) to determine the rate by which ClinVar updated the same HGVS expressions as those stored by HCI in the variant dictionary. To determine changes in clinical interpretation, the “ClinicalSignificance” field of the variant summary file was compared to the interpretation assigned by the HCI variant review team. Generally, the ClinVar variant summary file assigns a single classification for each alleleID. This means that although multiple conflicting interpretations are common in ClinVar, the variant summary file usually presents a single authoritative interpretation for each alleleID. The exception to this is in the most recent variant summary file used (variant_summary_2021-03.txt) where 32 alleleIDs with conflicting interpretations were found. However, this had no effect on our analysis since the HGVS expressions for those 32 alleleIDs are not present in the HCI variant dictionary.