Driving rare disease research
The ~40,000 openly consented patient records in DECIPHER contain more than 51,000 variants and ~172,000 phenotypes, and represents a rich dataset to drive rare disease research. Since its inception in 2004, DECIPHER has been cited more than 2,600 times in peer reviewed publications (Fig. 7A); a testimony to its impact on rare disease research. In some cases there is a large genotypic patient series, which allows, for example, the full spectrum of phenotypes associated with a gene to be recognised. At the time of writing, the genes with the most open access sequence variants wereNF1 (162), ANKRD11 (123), ARID1B (107),KMT2A (107), and DDX3X (78) (Fig. 7B).
Search: To identify the most relevant patient records and gene information DECIPHER offers a powerful search function allowing users to search using many different categories including gene, phenotype, HPO identifier, genomic position (in GRCh37 or GRCh38), chromosome band, pathogenicity, and inheritance. Advanced searches are supported, such as searching for multiple terms either from the same category (e.g. multiple phenotypes) or different categories (e.g. gene plus phenotype). Results are displayed in a tabular format, in addition to genome browser-based representations.
Driving discovery: The genotype-linked phenotypic data allows, for example, new variant-disease associations to be discovered, such as loss-of-function variants in ARFGEF1 causing developmental delay and epilepsy (Thomas et al ., 2021). The dataset also enables the extension of phenotypes for new syndromes to be uncovered (e.g. Witteveen-Kolk syndrome a SIN3A -related disorder Balasubramanianet al ., 2021), in addition to well established syndromes (e.g.ALG13 congenital disorder of glycosylation Alsharhan et al ., 2021). It also permits the understanding of contiguous gene effects, such as that around ERF which causes a novel craniosynostosis syndrome with varying degrees of intellectual disability (Calpena et al ., 2021).
DDD Research variants: In addition to the openly consented patient data, DECIPHER openly shares the DDD research variants, which are variants of unknown significance identified in undiagnosed probands with developmental disorders in the DDD study. These include functionalde novo variants and rare loss-of-function homozygous, compound heterozygous, and hemizygous variants in genes that are neither developmental disorder genes, nor OMIM-morbid genes. At present this dataset comprises nearly 5,000 variants. High-level phenotype terms are provided for each variant (Fig. 7C). The number of patients with each variant in the DDD dataset is displayed, in addition to the number of patients identified in the GeneDx and Radboud University Medical Centerde novo variant dataset as described by Kaplanis et al. , 2020. This dataset enables the discovery of new gene-disease associations.
Bulk data for research: The openly consented patient data is available for bulk download for research purposes, subject to a data access agreement. In bulk the data can be used, for example, for developing new analytical methods, in understanding patterns of polymorphism, and in refining critical intervals to map genes involved in specific phenotypes and diseases. The dataset has recently been used to associate phenotypes with functional systems (Jabato et al ., 2021), and to develop a new tool to assist clinical interpretation of CNVs (Requena et al ., 2021). DECIPHER also shares the data in bulk for display, subject to a Data Display Agreement. This allows third-party variant analysis companies and the academic genome browser providers such as Ensembl and UCSC to display the data, maximising the possibility of finding patient matches.