Bioinformatic analysis of exonic SREs
SRE predictions have poor specificity. There are several factors that contribute to the complexity of SRE prediction, including the diverse range of splicing regulatory motifs (Ke et al., 2011; X. H.-F. Zhang & Chasin, 2004) and the context-dependence of their activity (Fu & Ares Jr, 2014; Z. Wang & Burge, 2008). The surrounding sequences and their location in the gene relative to the consensus splice sites significantly impact their activity and usage. For instance, some ESS motifs, including G runs, can promote splicing when located in an intron (Z. Wang & Burge, 2008). Moreover, RNA secondary structure and chromatin state may also influence SRE accessibility affecting its usage (reviewed by (Fu & Ares Jr, 2014; Hnilicová & Staněk, 2011)).
There are already several datasets and prediction algorithms (Table 1) that have been used to identify SREs or test if a variant can potentially create or abolish SREs (reviewed by Grodecká, Buratti, and Freiberger (2017)). However, experimental studies have shown that these bioinformatic prediction tools have high false positive rates. For example, one of the largest studies to date (Houdayer et al., 2012) reported that predictions were confirmed for only 14% (15/108) ofBRCA1 and BRCA2 variants predicted to alter ESEs using a combination of ESEfinder, RESCUE-ESE, PESE octamer, and HSF algorithms.
More recently, two studies have assessed both positive andnegative predictive values of selected bioinformatic tools to determine variant effects on SREs. ΔtESRseq (using hexamer scores from Ke et al. (2011)) and ΔHZEI were reported to perform better than ΔΨ and EX-SKIP in analysis of 154 variants (including 50 spliceogenic) from select exons from five genes (Soukarieh et al., 2016). The data from this study led the authors to postulate that the predictive performance of SRE-dedicated tools varies for different genes and exons (Soukarieh et al., 2016). For example, sensitivity of ΔtESRseq ranged from 67-100% and specificity from 66-97% depending on the gene and exon (Soukarieh et al., 2016). In another evaluation of ΔtESRseq, ΔHZEI, and EX-SKIP (Grodecká et al., 2017), analysis of only 20 variants (10 spliceogenic) from four genes found that ΔtESRseq had higher sensitivity (80%) but lower specificity (60%) compared to ΔHZEI and EX-SKIP (both 70% sensitivity, 70% specificity). However, given the sample sizes for these two studies (Grodecká et al., 2017; Soukarieh et al., 2016), it is difficult to have confidence in their assessment of comparative performance of bioinformatic tools.