WES data analysis
We used VarSeq software version 1.5.0 (Golden Helix) to annotate and filter the variants. SNV and indel variants were filtered by read depth (>10), Phred score (>20), and variant allele frequency (>0.35 for germline, and >0.10 for somatic variants). Variant annotation was performed based on public databases of populational, clinical, and functional databases.
Germline variants with populational frequencies above 0.5% or 1% for recessive and dominant models of inheritance, respectively, were filtered out. The somatic mutations in the tumor sample were obtained excluding all germline variants. Following, variants were filtered based on Sequence Ontology by RefSeq, and only coding non-synonymous missense and essential splice site, frameshift, and gain/loss of stop-codons (loss of function - LoF) were maintained for further analysis. In silico prediction of pathogenicity of missense variants were based on six algorithms provided by the database dbNSFP (version 2.4). The potential damaging effect was also assessed using the VEP32 script software package from Ensembl (https://www.ensembl.org/), and only variants predicted as pathogenic by at least five different tools were prioritized. All the LoF variants were also prioritized. The final list of filtered variants was annotated using Varelect 33and HPO 34 for ranking genes associated to the specific phenotype of the patients. The variants were validated by visual inspection using the Integrated Genomics Viewer (IGV). The prioritized germline variants were classified according to the ACMG guidelines 35,36, using the Varsome tool37. The Supporting Information Figure S1 summarizes the approach for WES data analysis.
Two prioritized germline variants from candidate genes (CYP1A1and CEP164 ) and mutational hotspots of TERT promoter were investigated by Sanger sequencing (primer pairs are available under request).