De novo mutations (DNMs) play an important role in severe genetic disorders that reduce fitness. To better understand the role of DNMs in disease, it is important to determine the parent-of-origin and timing of the mutational events that give rise to the mutations, especially in sex-specific developmental disorders such as male infertility. However, currently available short-read sequencing approaches are not ideally suited for phasing as this requires long continuous DNA strands that span both the DNM and one or more informative SNPs. To overcome these challenges, we optimised and implemented a multiplexed long-read sequencing approach using the Oxford Nanopore technologies MinION platform. We specifically focused on improving target amplification, integrating long-read sequenced data with high-quality short-read sequence data, and developing an anchored phasing computational method. This approach was able to handle the inherent phasing challenges that arise from long-range target amplification and the normal accumulation of sequencing error associated with long-read sequencing. In total, 77 out of 109 DNMs (71%) were successfully phased and parent-of-origin identified. The majority of phased DNMs were prezygotic (90%), the accuracy of which is highlighted by the average mutant allele frequency of 49.6% and a standard error margin of 0.84%. This study demonstrates the benefits of using an integrated short-read and long-read sequencing approach for large-scale DNM phasing.
Routine exome sequencing (ES) in individuals with neurodevelopmental disorders (NDD) remains inconclusive in >50%. Research analysis of unsolved cases can identify novel candidate genes but is time consuming, subjective, and hard to compare between labs. The field therefore needs automated and standardized assessment methods to prioritize candidates for matchmaking. We developed AutoCaSc (https://autocasc.uni-leipzig.de) based on our candidate scoring scheme (CaSc). We validated our approach using synthetic trios and real in-house trio ES data. AutoCaSc consistently (94.5%) scored variants in valid novel NDD genes in the top three ranks. In 93 real trio exomes, AutoCaSc identified most (97.5%) previously manually scored variants while evaluating additional highly scoring variants missed in manual evaluation. It identified candidate variants in previously undescribed NDD candidate genes ( CNTN2, DLGAP1, SMURF1, NRXN3, PRICKLE1). AutoCaSc enables anybody to quickly screen a variant for its plausibility in NDD. After contributing >40 descriptions of NDD associated genes, we provide usage recommendations based on our extensive experience. Our implementation is capable of pipeline integration and therefore allows screening of large cohorts for candidate genes. AutoCaSc empowers even small labs to a standardized matchmaking collaboration and to contribute to the ongoing identification of novel NDD entities.
Next-generation phenotyping (NGP) is the application of advanced methods of computer vision on medical imaging data such as portrait photos of individuals with rare disorders. NGP on portraits results in gestalt scores that can be used for the selection of appropriate genetic tests, and for the interpretation of the molecular data. Here, we report on an exceptional case of a young girl that was presented at the age of eight and fifteen and enrolled in NGP diagnostics at the latter occasion. The girl had clinical features associated with Koolen-de Vries syndrome and a suggestive facial gestalt. However, chromosomal micro array (CMA), Sanger sequencing, multiplex ligation-dependent probe analysis (MLPA), and trio exome sequencing remained inconclusive. Based on the highly indicative gestalt score for Koolen-de Vries, the decision was made to perform genome sequencing to also evaluate non-coding variants. This analysis revealed a 4.7 kb deletion at the end of intron 6 of the KANSL1 gene, which is the smallest reported structural variant to date for this phenotype. The case illustrates how NGP can be integrated into the iterative diagnostic process of test selection and interpretation of sequencing results.
The Matchmaker Exchange (MME) launched in 2015 to provide a robust mechanism to discover novel disease-gene relationships. This federated network connects databases holding relevant data, where two or more users are looking for a match for the same gene (two-sided matchmaking). The number of unique genes present across MME has steadily increased; there are currently >13,520 unique genes (~68% of all protein coding genes) connected across MME’s nodes, GeneMatcher, DECIPHER, PhenomeCentral, MyGene2, seqr, Initiative on Rare and Undiagnosed Disease, PatientMatcher, and the RD-Connect Genome-Phenome Analysis Platform. The dataset accessible across MME includes more than 120,000 cases from over 12,000 contributors in 98 countries. Discovery of potential new disease-gene relationships occurs daily and international collaborations are moving these connections forward to publication. Expansion of data sharing into routine clinical practice has ensured access to discovery for even more individuals with undiagnosed rare genetic disease. MME supports connections to the literature (PubCaseFinder) and to human and model organism resources (Monarch Initiative) and scientists (ModelMatcher). Efforts are underway to explore additional approaches to matchmaking where there is only one querier (one-sided matchmaking). Genomic matchmaking has proven its utility over the past 7 years and will continue to facilitate discoveries in years to come.
MORC2 gene encodes a ubiquitously expressed nuclear protein involved in chromatin remodeling, DNA repair, and transcriptional regulation. Heterozygous mutations in MORC2 gene have been associated with a spectrum of disorders affecting the peripheral nervous system such as Charcot-Marie-Tooth (CMT2Z), spinal muscular atrophy-like (SMA-like) with or without cerebellar involvement, and a developmental syndrome associated with impaired growth, craniofacial dysmorphism and axonal neuropathy (DIGFAN syndrome). Such variability in clinical manifestations associated with the increasing number of variants of unknown significance detected by next-generation sequencing constitutes a serious diagnostic challenge. Here we report the characterization of an in vitro model to evaluate the pathogenicity of variants of unknown significance based on MORC2 overexpression in a neuroblastoma cell line SH-EP or in cortical neurons. Likewise, we show that MORC2 mutants affect survival and trigger apoptosis over time in SH-EP cell line. Furthermore, overexpression in primary cortical neurons increases apoptotic cell death and decreases neurite outgrowth. Altogether, these approaches establish the pathogenicity of two new variants p.G444R and p.H446Q in three patients from two families. These new mutations in MORC2 gene are associated with autosomal dominant CMT and with adult late onset SMA-like phenotype, further increasing the spectrum of clinical manifestations associated with MORC2 mutations.
Pathogenic variants in JAG1 are known to cause Alagille syndrome (ALGS), a disorder that primarily affects the liver, lung, kidney and skeleton. Whereas cardiac symptoms are also frequently observed in ALGS, thoracic aortic aneurysms have only been reported sporadically in post-mortem autopsies. We here report two families with segregating JAG1 variants that present with isolated aneurysmal disease, as well as the first histological evaluation of aortic aneurysm tissue of a JAG1 variant carrier. Our observations shed more light on the pathomechanisms behind aneurysm formation in JAG1 variant carriers and underline the importance of cardiovascular imaging in the clinical follow-up of JAG1 variant carrying individuals.
The vast volume of data that has been generated as a result of the next-generation sequencing revolution is overwhelming to sift through and interpret. Parsing functional vs. non-functional and benign vs. pathogenic variants continues to be a challenge. Out of three billion bases, the genomes of two given individuals will only differ by about 3 million variants (0.1%). Furthermore, only a small fraction of these are biologically-relevant and, of those that are functional, only a handful actually drive disease pathology. While whole genome and exome sequencing have transformed our collective understanding of the role that genetics plays in disease pathogenesis, there are certain conditions and populations for whom DNA-level data has failed to produce a molecular diagnosis. Patients of non-White race/non-European ancestry are disproportionately affected by “variants of unknown/uncertain significance” (VUS). This limits the scope of precision medicine for minority patients and perpetuates health disparities. VUS often include deep intronic and splicing variants which are difficult to interpret in DNA alone. RNA analysis is capable of illuminating the consequences of VUS thereby allowing for their reclassification as pathogenic vs. benign. Here we review the critical role, going forward, of transcriptome analysis for clarifying VUS in both neoplastic and non-neoplastic diseases.
The ATP-binding cassette (ABC) transporter superfamily comprises membrane proteins that efflux various substrates across extra- and intra-cellular membranes. Mutations in ABC genes cause 21 human disorders or phenotypes with Mendelian inheritance, including cystic fibrosis, adrenoleukodystrophy, retinal degeneration, cholesterol, and bile transport defects. Common polymorphisms and rare variants in ABC genes are associated with several complex phenotypes such as gout, gallstones, and cholesterol levels. Overexpression or amplification of specific drug efflux genes contributes to chemotherapy multidrug resistance. Conservation of the ATP-binding domains of ABC transporters defines the superfamily members, and phylogenetic analysis groups the 48 human ABC transporters into seven distinct subfamilies. While the conservation of ABC genes across most vertebrate species is high, there is also considerable gene duplication, deletion, and evolutionary diversification.
Neural Tube Defects (NTDs) are congenital malformations resulting from abnormal embryonic development of the brain, spine, or spinal column. The genetic etiology of human NTDs remains poorly understood despite intensive investigation. CIC, homolog of the Capicua transcription repressor, has been reported to interact with ataxin-1 (ATXN1) and participate in the pathogenesis of spinocerebellar ataxia type 1. Our previous study demonstrated that CIC loss of function (LoF) variants contributed to cerebral folate deficiency by downregulating folate receptor 1 (FOLR1) expression. Given the importance of folate transport in neural tube formation, we hypothesized that CIC variants could contribute to increased risk for NTDs by depressing embryonic folate concentrations. In this study, we examined CIC variants from whole genome sequencing (WGS) data of 140 isolated spina bifida cases and identified 8 missense variants of CIC gene. We tested the pathogenicity of the observed variants through multiple in vitro experiments. We determined that CIC variants decreased FOLR1 protein level and planar cell polarity (PCP) pathway signaling in a human cell line (HeLa). In a murine cell line (NIH3T3), CIC loss of function variants down regulated PCP signaling. Taken together, this study provides evidence supporting CIC as a risk gene for human NTD.
The large majority of germline alterations identified in the DNA mismatch repair (MMR) gene PMS2, a low-penetrance gene for the cancer predisposition Lynch Syndrome (LS, OMIM 120435), represent variants of unknown significance (VUS). The inability to assess pathogenicity of such VUS interferes with personalized healthcare. The complete in vitro MMR activity (CIMRA) assay, that only requires sequence information on the VUS, provides a functional analysis-based tool suited for VUS classification. To derive a formula that translates CIMRA assay results for PMS2 VUS into the odds of pathogenicity (OddsPath), we used a set of clinically classified PMS2 variants, supplemented by inactivating variants generated by an in cellulo genetic screen, as proxies for pathogenic variants. Validation of this OddsPath revealed very high predictive values for PMS2 VUS. We conclude that this OddsPath provides an integral metric that, similar to the other, higher penetrance, MMR proteins MSH2, MLH1 and MSH6, can be incorporated into the upcoming criteria for MMR gene VUS classification of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP). This will represent a seminal step forward in enabling personalized healthcare for individuals suspected of LS and their relatives.
Male infertility has become a serious health and social problem troubling approximately 15% of couples worldwide; however, the genetic and phenotypic heterogeneity of human infertility poses a substantial obstacle to effective diagnosis and therapy. A previous study reported that heterozygous mutations in solute carrier family 26 member 8 (SLC26A8, NG_033897.1) were causatively linked to asthenozoospermia. Interestingly, in our research, three deleterious heterozygous mutations of SLC26A8 were separately detected in three unrelated patients who were suffered from teratozoospermia. These three heterozygous mutations resulted in the reduce of SLC26A8 expression in transfected cells, while no disrupt expression of SLC26A8 was observed in sperm from the affected individuals. Noticeably, two of the three SLC26A8 heterozygous mutations detected in the patients were inherited from their fertile fathers. Thus, we suggested that male infertility associated with SLC26A8 mutations should be involved in a recessive-inherited pattern, considering the infertile homozygous Slc26a8 KO male mice. Given that SLC26A8 heterozygous mutations were detected in the infertile patients, and SLC26A8 is predominantly expressed in the various germ cells during spermatogenesis, the heterozygous mutations in SLC26A8 may not be the direct genetic cause but contribute to male infertility to a certain degree.
Mitogen-Activated Protein 3 Kinase 7 (MAP3K7, MIM 602614) encodes the ubiquitously expressed transforming growth factor β (TGF-β)–activated kinase 1 (TAK1), which plays a crucial role in many cellular processes. Variants in the MAP3K7 gene have been linked to 2 distinct disorders: frontometaphyseal dysplasia type 2 (FMD2, MIM #617137) and cardiospondylocarpofacial syndrome (CSCF, MIM #157800). The fact that different variants can induce 2 distinct phenotypes suggests a phenotype/genotype correlation, but no side-by-side comparison has been done thus far to confirm this. Here we significantly expand the cohort and the description of clinical phenotypes for individuals with CSCF and FMD2 who carry variants in MAP3K7. We show that in contrast to FMD2-causing variants, CSCF-causing variants in MAP3K7 have a loss-of-function effect. Additionally, patients with pathogenic variants in MAP3K7 are at risk for cardiac disease, have symptoms associated with connective tissue disease and we show overlap in clinical phenotypes of CSCF with Noonan syndrome. Together, we provide evidence for a molecular fingerprint of FMD2- versus CSCF-causing MAP3K7 variants and conclude that variants in MAP3K7 should be considered in the differential diagnosis of patients with syndromic congenital cardiac defects and/or cardiomyopathy, syndromic connective tissue disorders and in the differential diagnosis of Noonan syndrome.
Here we describe MyGene2, Geno2MP, VariantMatcher, and Franklin; databases that have made variant-level information together with phenotype or phenotypic features available to researchers, clinicians, health care providers and patients. Following in the footsteps of the Matchmaker Exchange project that connects exome, genome, and phenotype databases at the gene level, these databases plan to connect to each other using Data Connect, a standard for discovery and search of biomedical data from the Global Alliance for Genomics and Health (GA4GH).
Clinical and research laboratories extensively use exome sequencing due to its high diagnostic rates, cost savings, impact on clinical management, and efficacy for disease gene discovery. While the rates of disease gene discovery have steadily increased, only ~16% of genes in the genome have confirmed disease associations. Here we describe our diagnostic laboratory’s disease gene discovery and ongoing data-sharing efforts with GeneMatcher. In total, we submitted 246 candidates from 243 unique genes to GeneMatcher, of which 45.93% are now clinically characterized. Submissions with at least one case meeting our candidate genes reporting criteria were significantly more likely to be characterized as of October 2021 compared to genes with no candidates meeting our reporting criteria (p=0.025). We reported relevant findings related to these gene-disease associations for 480 probands. In 219 (45.63%) instances, these results were reclassifications after an initial candidate gene (uncertain) or negative report. Since 2013, we have co-authored 105 publications focused on delineating gene-disease associations. Diagnostic laboratories are pivotal for disease gene discovery efforts and can screen phenotypes based on genotype matches, contact clinicians of relevant cases, and issue proactive reclassification reports. GeneMatcher is a critical resource in these efforts.
A major challenge in validating genetic causes for patients with rare diseases (RDs) is the difficulty in identifying other RD patients with overlapping phenotypes and variants in the same candidate gene. This process, known as matchmaking, requires robust data sharing solutions in order to be effective. In 2014 we launched PhenomeCentral, a RD data repository capable of collecting computer-readable genotypic and phenotypic data for the purposes of RD matchmaking. Over the past 7 years PhenomeCentral’s features have been expanded and its dataset has consistently grown. There are currently 1,615 users registered on PhenomeCentral, which have contributed over 12,000 patient cases. Most of these cases contain detailed phenotypic terms, with a significant portion also providing genomic sequence data or other forms of clinical information. Matchmaking within PhenomeCentral, and with connections to other data repositories in the Matchmaker Exchange, have collectively resulted in over 60,000 matches, which have facilitated multiple gene discoveries. The collection of deep phenotypic and genotypic data has also positioned PhenomeCentral well to support next generation of matchmaking initiatives that utilize genome sequencing data, ensuring that PhenomeCentral will remain a useful tool in solving undiagnosed RD cases in the years to come.
DECIPHER (https://www.deciphergenomics.org) is a free web platform for sharing anonymised phenotype-linked variant data from rare disease patients. Its dynamic interpretation interfaces contextualise genomic and phenotypic data to enable more informed variant interpretation, incorporating international standards for variant classification. DECIPHER supports almost all types of germline and mosaic variation in the nuclear and mitochondrial genome: sequence variants, short tandem repeats, copy-number variants and large structural variants. Patient phenotypes are deposited using Human Phenotype Ontology (HPO) terms, supplemented by quantitative data, which is aggregated to derive gene-specific phenotypic summaries. It hosts data from >250 projects from ~40 countries, openly sharing ~40,000 patient records containing >51,000 variants and >172,000 phenotype terms. The rich phenotype-linked variant data in DECIPHER drives rare disease research and diagnosis by enabling patient matching within DECIPHER and with other resources, and has been cited in >2,600 publications. In this paper, we describe the types of data deposited to DECIPHER, the variant interpretation tools, and patient matching interfaces which make DECIPHER an invaluable rare disease resource.
The use of whole-genome sequencing (WGS) has accelerated the pace of gene discovery and highlighted the need for open and collaborative data sharing in the search for novel disease genes and variants. GeneMatcher (GM) is designed to facilitate connections between researchers, clinicians, health-care providers and others to help in the identification of additional patients with variants in the same candidate disease genes. The Illumina Clinical Services Laboratory offers a WGS test for patients with suspected rare and undiagnosed genetic disease and regularly submits potential candidate genes to GM to strengthen gene-disease relationships. We describe our experience with GM, including criteria for evaluation of candidate genes, and our workflow for the submission and review process. We have made 69 submissions, 36 of which are currently active. Ten per cent of submissions have resulted in publications, with an additional 14 submissions part of ongoing collaborations and expected to result in a publication.