Discussion
The
originality of the FLc-Capture method
The
first distinctive originality of the FLc-Capture method is to use the
SMART technology, which is widely adopted in cDNA cloning researches, to
synthesize cDNA probes. The SMART technology guarantees that each of the
synthesized cDNA is in full-length, consisting of open reading frames
(ORF) and untranslated regions (5’ UTR and 3’ UTR). This unique feature
allows researchers to enrich coding and noncoding sequences from genomes
simultaneously.
In
our demonstrating snake case, the final lengths of ORF and UTR datasets
are 817 K and 1,114 K, respectively, relatively close, indicating that
FLc-Capture can enrich both coding and noncoding sequences with similar
efficiency.
Because
cDNA sequences are discontinuous in genomes (interrupted by introns),
the direct use of full-length cDNA probes to capture ORF and UTR regions
from DNA libraries makes data post-processing more challenging.
Therefore, another originality of FLc-Capture is its unique data
processing strategy. Considering
ORF sequences are fractured in genomes, and UTR sequences are usually
continuous in genomes, FLc-Capture adopts two different ways to extract
ORF and UTR sequences from capture data, respectively.
For ORF, FLc-Capture first
assembled reads to contigs, identified exons from contigs, and then
mapped identified exons onto the
reference coding sequences.
Our
study demonstrated that this ”exon mapping” strategy could extract
coding sequences from genetically distant samples (~15%
divergence in our study) without the need for highly similar reference
sequences.
For
UTR, because they are most likely within a single assembled contig,
FLc-Capture directly adopts a mutual best-hit (MBH) strategy to identify
orthologous UTR sequences to the reference UTRs. Our case study showed
that these two specially designed bioinformatics pipelines are
effective, able to extract thousands of ORF and UTR sequences from both
ingroup species and more distantly related outgroup species.