Objective 2 - LocalVar tool creation
The justification for the functionality of LocalVar is included in the
results of the first study objective. This section details the
development of this functionality and other design choices for the
LocalVar tool. A Flask web application architecture was chosen to allow
integration of the various GA4GH Python modules created to provide VRS
identifier generation functionality. Because this is typically less
versatile than a JavaScript web application, an optional Dockerfile was
also included to assist in environment setup.
The tool was designed to be initialized with the upload of a .csv file
representing a given institution’s variant collection. This format was
chosen because it is a common export type of SQL databases, Excel, and
other storage services that may be currently used by institutions that
maintain variant collections. The tool was created to be
institution-agnostic, so a prompt is provided for users to select the
names of the column containing the HGVS expressions and the column
containing the variant interpretations. This allows the tool to then
automatically create VRS identifiers for each entry in the file and
place them in a newly added “VRS” column. The merits of VRS
identifiers and a justification for their inclusion are provided in the
discussion section of this study. The VRS identifiers are generated
using HGVS to VRS Allele identifier python code that is provided by the
GA4GH vrs-python repository on GitHub17.
An integral part of the LocalVar functionality is the creation of “HGVS
bins” that are subsequently used to detect synonyms and interpretation
conflicts/updates. These bins are stored as a JSON object with the HGVS
expression as the key and the unique collection ID and variant
interpretation as values, as shown in Figure 1. If that HGVS expression
is present in ClinVar, additional values are added to the bin as shown
in Figure 2. These include the variationID associated with that HGVS
expression, synonymous HGVS expressions stored in ClinVar (each
associated with the same variationID), and the ClinVar interpretation
for that variant. These bins are asynchronously updated by LocalVar with
each monthly release of the ClinVar variant summary file
(variant_summary_YYYY-MM.txt.gz , part of the ClinVar
tab_delimited archive) which is where these ClinVar added data are
taken from.
Edits can come from the acceptance of any of the suggestions mentioned
above, from the addition (single or bulk) or deletion of variant
entries, or be made manually to specific variant record fields. All of
these edits made to variant records in the collection are time-stamped
and stored by LocalVar using a JSON object with the unique collection
identifier as key and edit events stored as values.