Discussion
The originality of the FLc-Capture method
The first distinctive originality of the FLc-Capture method is to use the SMART technology, which is widely adopted in cDNA cloning researches, to synthesize cDNA probes. The SMART technology guarantees that each of the synthesized cDNA is in full-length, consisting of open reading frames (ORF) and untranslated regions (5’ UTR and 3’ UTR). This unique feature allows researchers to enrich coding and noncoding sequences from genomes simultaneously. In our demonstrating snake case, the final lengths of ORF and UTR datasets are 817 K and 1,114 K, respectively, relatively close, indicating that FLc-Capture can enrich both coding and noncoding sequences with similar efficiency.
Because cDNA sequences are discontinuous in genomes (interrupted by introns), the direct use of full-length cDNA probes to capture ORF and UTR regions from DNA libraries makes data post-processing more challenging. Therefore, another originality of FLc-Capture is its unique data processing strategy. Considering ORF sequences are fractured in genomes, and UTR sequences are usually continuous in genomes, FLc-Capture adopts two different ways to extract ORF and UTR sequences from capture data, respectively. For ORF, FLc-Capture first assembled reads to contigs, identified exons from contigs, and then mapped identified exons onto the reference coding sequences. Our study demonstrated that this ”exon mapping” strategy could extract coding sequences from genetically distant samples (~15% divergence in our study) without the need for highly similar reference sequences. For UTR, because they are most likely within a single assembled contig, FLc-Capture directly adopts a mutual best-hit (MBH) strategy to identify orthologous UTR sequences to the reference UTRs. Our case study showed that these two specially designed bioinformatics pipelines are effective, able to extract thousands of ORF and UTR sequences from both ingroup species and more distantly related outgroup species.