Data Accessibility and Benefit-Sharing
The source code of the program and the user manual are freely available
at https://github.com/wpwupingwp/OGU under GNU Affero General Public
License (AGPL-3). Datasets used for benchmarking and related outputs
from OGU have been deposited to Zenodo
(https://dx.doi.org/10.5281/zenodo.10695931).
Benefits from this research accrue from the sharing of our software,
data and results on public databases and code repositories as described
above.
Author Contributions
Ping Wu wrote the program, the manuscript and conducted the analysis.
Ningning Xue proofread the manuscript and joined the design and test of
the Primer module. Jie Yang tested and optimized the GB2fasta module.
Qiang Zhang joined the implementation of phylogenetic diversity. Yuzhe
Sun and Wen Zhang tested and advised the Evaluate module.
Figure legends
Figure 1: The performance of evaluation methods on Lamiaceae data
A) Correlation coefficient matrix of different sequence variance
indicators for the “default” dataset. Black/white color of the numbers
in the matrix is only for distinguishing from the background. B) Effects
of alignment gaps and ambiguous bases on sequence polymorphism
evaluation. C) Determination of the highest mutation fragment by
different methods. D) Relationship between PD, PD-stem and PD-terminal.
“Observed_Res” represents the observed resolution method,
“Tree_Res” represents the tree resolution method.
Figure 2: Sequence variance of different kinds of regions on 308 plastid
genomes
A) GC ratio and gap ratio of five kinds of fragments on 308 angiosperm
plastid genomes. Filled boxes are GC ratio and border-only boxes are gap
ratio. B) Pi and tree resolution of fragments. Left axis is for Pi and
right is for tree resolution. C) PD-stem and PD-terminal of fragments.
D) The circular plot of sequence variances. Fragments are ordered
according to the plastid genome structure of tobacco and white region
indicates that the sequence used for analysis does not contain the
fragment corresponding to this position in the tobacco plastid genome.
One invert repeat region is omitted for convenience.
Supplemental information
S1. Schematic diagram of stem and terminal phylogenetic diversity
S2. Extraction results on 1 million random GenBank record
S3. Extraction results of one million random GenBank records
S4. Evaluation results of Lamiaceae data
S5. Top 10 highly variance Lamiaceae loci
S6. Sliding window analysis of Lamiaceae rbcL
S7. Lamiaceae rbcL multiple sequence alignment result
S8. Universal primer design results of Lamiaceae rbcL
S9. Evaluation results of 308 angiosperm plastid genomes
S10. Significant test results of 308 angiosperm plastid genomes
S11. Variance of 30 selected plastid intergenic spacers
S12. Consensus tree of CDS data from 308 angiosperm families
S13. Consensus tree of spacer data from 308 angiosperm families
S14. Evaluation results of rodents data
S15. Visualization of rodents mitochondrion genome’s variance