3.3 | Effects of using different in silico mate pairs on genome assemblies of T. bimaculatus
Assembling the genome of T. bimaculatus , using only the paired-end reads yielded a NGA50 and a complete BUSCO number of 4.7 kb and 1,626, respectively, (Table 2). The original in silicomethod, as well as the optimized in silico method, improved the genome assembly of T. bimaculatus , significantly. Compared to the original in silico method (using one reference from the same genus, ‘rub’: T. rubripes or ‘fla’: T. flavidus ), the optimized in silico method (using two reference from the same genus, ‘rub’ and ‘fla’) increased the NGA50 (rub*: 140.2 Kb; fla*: 131.4 Kb vs. rub-fla**: 183.8 Kb) and reduced misassemblies markedly (rub*:5,143; fla*: 5,148 vs. rub-fla**: 4,188) with comparable number of complete BUSCOs (rub*:2,358; fla*: 2,366 vs. rub-fla**: 2,367).
Compared to the original in silico method, the optimized in silico method which generated conserved mate pairs using more than two reference genomes (3 references: two from the same genus, ‘rub’, ‘fla’ and one from the same order, ‘nig’; 4 references: using two reference from the same genus, ‘rub’ , ‘fla’, one reference from the same family, ‘nig’, and one reference from the same order, ‘mol’) drastically reduced misassemblies (rub*: 5,143; fla*: 5,148, nig*: 5,843, mol*: 4,132 vs. rub-fla-nig**: 2,159, rub-fla-nig-mol*: 1,796), but failed to increase either the NGA50 (rub*: 140.2 Kb; fla*: 131.4 Kb, nig*: 7.2 Kb, mol*: 4.7 Kb vs. rub-fla-nig**: 7.5Kb, rub-fla-nig-mol*: 4.6 Kb) or the number of complete BUSCOs (rub*:2,358; fla*:2,366, nig*:1,772, mol*:1,625 vs. rub-fla-nig**: 1,842, rub-fla-nig-mol**: 1,671).
We compared the mate pairs generated using one reference genome (T. rubripes ) with the conserved mate pairs generated using two reference genomes (T. rubripes and T. flavidus ). We found that the extra mate pairs generated using one reference were mostly inverted on the target genome (60.03% to 66.62%), while the remaining mate pairs either had length deviation on the target genome or were mapped to different scaffolds of the target genome (Table S12).