Materials and Methods

Workflow of genetic variant data integration

Data selection and retrieval

In a recent study (Townend et al., 2018), we identified 13 genotype-phenotype databases containing RTT-specific MECP2variation data. We evaluated each of these for specific requirements for data integration. Data should be 1) available and permitted to be re-used and redistributed, 2) the given description of genetic variants should be for an unambiguous variation. The latter means that the exact position (chromosome build and location) as well as the variation of the genetic variants are available or retrievable by conversion, thus, they can be described using the HGVS nomenclature. For this study, we selected eight databases and downloaded all MECP2 genetic variants with available linked phenotype information from each of these databases: ClinVar (Landrum et al., 2016),https://www.ncbi.nlm.nih.gov/clinvar/, DECIPHER (Firth et al., 2009),https://decipher.sanger.ac.uk/, EVA (http://www.ebi.ac.uk), EVS (http://evs.gs.washington.edu), ExAC (Lek et al., 2016),http://exac.broadinstitute.org/, KMD (https://kmd.nih.go.kr), LOVD (Fokkema et al., 2011), MECP2 collection:https://databases.lovd.nl/shared/genes/MECP2), and RettBASE (Krishnaraj, Ho, & Christodoulou, 2017),http://mecp2.chw.edu.au/. Additionally, an anonymized dataset from local RTT patients were included (Maastricht Rett dataset, permission granted by Niet-WMO verklaring 2018-0597, Maastricht University METC approval). Either the integrated download function was used to get the data or data was extracted from HTML (see the availability of download functions in (Townend et al., 2018). Figure 1 shows the data processing (step 1-3) and analysis (step 4) workflow of this study.