Objective 2 - LocalVar tool creation
The justification for the functionality of LocalVar is included in the results of the first study objective. This section details the development of this functionality and other design choices for the LocalVar tool. A Flask web application architecture was chosen to allow integration of the various GA4GH Python modules created to provide VRS identifier generation functionality. Because this is typically less versatile than a JavaScript web application, an optional Dockerfile was also included to assist in environment setup.
The tool was designed to be initialized with the upload of a .csv file representing a given institution’s variant collection. This format was chosen because it is a common export type of SQL databases, Excel, and other storage services that may be currently used by institutions that maintain variant collections. The tool was created to be institution-agnostic, so a prompt is provided for users to select the names of the column containing the HGVS expressions and the column containing the variant interpretations. This allows the tool to then automatically create VRS identifiers for each entry in the file and place them in a newly added “VRS” column. The merits of VRS identifiers and a justification for their inclusion are provided in the discussion section of this study. The VRS identifiers are generated using HGVS to VRS Allele identifier python code that is provided by the GA4GH vrs-python repository on GitHub17.
An integral part of the LocalVar functionality is the creation of “HGVS bins” that are subsequently used to detect synonyms and interpretation conflicts/updates. These bins are stored as a JSON object with the HGVS expression as the key and the unique collection ID and variant interpretation as values, as shown in Figure 1. If that HGVS expression is present in ClinVar, additional values are added to the bin as shown in Figure 2. These include the variationID associated with that HGVS expression, synonymous HGVS expressions stored in ClinVar (each associated with the same variationID), and the ClinVar interpretation for that variant. These bins are asynchronously updated by LocalVar with each monthly release of the ClinVar variant summary file (variant_summary_YYYY-MM.txt.gz , part of the ClinVar tab_delimited archive) which is where these ClinVar added data are taken from.
Edits can come from the acceptance of any of the suggestions mentioned above, from the addition (single or bulk) or deletion of variant entries, or be made manually to specific variant record fields. All of these edits made to variant records in the collection are time-stamped and stored by LocalVar using a JSON object with the unique collection identifier as key and edit events stored as values.