Project name: Sewing machine pipeline
Project home page: The Sewing machine script and tutorial are available at https://github.com/i5K-KINBRE-script-share/Irys-scaffolding/blob/master/KSU_bioinfo_lab/stitch/sewing_machine_LAB.md.
Operating system(s): Linux (tested on CentOS 7, Gentoo and Ubuntu).
Programming language: Perl, Rscript, Bash
License: Pipeline script and tutorial are available free of charge to academic and non-profit institutions.
Any restrictions to use by non-academics: Please contact authors for commercial use.
Dependencies: Sewing machine requires BioPerl and BNGCompare. RefAligner is also required between iterations and can be provided by request by Bionano Genomics http://www.bionanogenomics.com/.
Project name: “Raw data-to-finished assembly and assembly analysis” pipeline Project home page: The pipeline script and tutorial are available at https://github.com/i5K-KINBRE-script-share/Irys-scaffolding/blob/master/KSU_bioinfo_lab/assemble_XeonPhi/assemble_XeonPhi_LAB.md.
Operating system(s): Xeon Phi server with 1488 threads (6x60x4 Xeon Phi co-processor threads + 24x2 Xeon host threads) and 256GB of host RAM + 6 x 8GB Xeon Phi Ram, and Linux CentOS 7.
Programming language: Perl, Rscript, Bash
License: Pipeline script and tutorial are available free of charge to academic and non-profit institutions.
Any restrictions to use by non-academics: Please contact authors for commercial use.
Dependencies: AssembleIrysXeonPhi.pl and AssembleIrysCluster.pl requires DRMAA job submission libraries. RefAligner and Assembler are also required and can be provided by request by Bionano Genomics http://www.bionanogenomics.com/.
Project name: “Raw data-to-finished de novo assembly and assembly analysis” pipeline Project home page: The pipeline script and tutorial are available at https://github.com/i5K-KINBRE-script-share/Irys-scaffolding/blob/master/KSU_bioinfo_lab/assemble_XeonPhi/assemble_XeonPhi_de_novo_LAB.md.
Operating system(s): Xeon Phi server with 1488 threads (6x60x4 Xeon Phi co-processor threads + 24x2 Xeon host threads) and 256GB of host RAM + 6 x 8GB Xeon Phi Ram, and Linux CentOS 7.
Programming language: Perl, Rscript, Bash
License: Pipeline script and tutorial are available free of charge to academic and non-profit institutions.
Any restrictions to use by non-academics: Please contact authors for commercial use.
Dependencies: AssembleIrysXeonPhi.pl and AssembleIrysCluster.pl requires DRMAA job submission libraries. RefAligner and Assembler are also required and can be provided by request by Bionano Genomics http://www.bionanogenomics.com/.
Project name: AssembleIrysXeonPhi.pl / AssembleIrysCluster.pl
Project home page: AssembleIrysXeonPhi scripts are available at https://github.com/i5K-KINBRE-script-share/Irys-scaffolding/blob/master/KSU_bioinfo_lab/assemble_XeonPhi/AssembleIrysXeonPhi.pl. The currently unsupported AssembleIrysCluster scripts are available on Github at https://github.com/i5K-KINBRE-script-share/Irys-scaffolding/tree/master/KSU_bioinfo_lab/assemble_SGE_cluster
Operating system(s): Xeon Phi server with 1488 threads (6x60x4 Xeon Phi co-processor threads + 24x2 Xeon host threads) and 256GB of host RAM + 6 x 8GB Xeon Phi Ram, and Linux CentOS 7 and SGE Linux (tested on a Gentoo) cluster respectively
Programming language: Perl, Rscript, Bash
License: AssembleIrysXeonPhi and AssembleIrysCluster.pl is available free of charge to academic and non-profit institutions.
Any restrictions to use by non-academics: Please contact authors for commercial use.
Dependencies: AssembleIrysXeonPhi.pl and AssembleIrysCluster.pl requires DRMAA job submission libraries. RefAligner and Assembler are also required and can be provided by request by Bionano Genomics http://www.bionanogenomics.com/.
Project name: stitch.pl
Project home page: Stitch scripts are available on Github at https://github.com/i5K-KINBRE-script-share/Irys-scaffolding/tree/master/KSU_bioinfo_lab/stitch
Operating system(s): MAC and LINUX (tested on Gentoo and Ubuntu)
Programming language: Perl, Rscript, Bash
License: stitch.pl is available free of charge to academic and non-profit institutions.
Any restrictions to use by non-academics: Please contact authors for commercial use.
Dependencies: stitch.pl requires BioPerl. RefAligner and Assembler are also required between iterations and can be provided by request by Bionano Genomics http://www.bionanogenomics.com/.
Project name: BNGCompare.pl, bnx_stats.pl, cmap_stats.pl and xmap_stats.pl
Project home page: all scripts are available on Github at https://github.com/i5K-KINBRE-script-share/Irys-scaffolding/tree/master/KSU_bioinfo_lab/map_tools and https://github.com/i5K-KINBRE-script-share/BNGCompare
Operating system(s): MAC and LINUX (tested on Gentoo and Ubuntu)
Programming language: Perl, Rscript, Bash
License: bnx_stats.pl, cmap_stats.pl and xmap_stats.pl are available free of charge to academic and non-profit institutions.
Any restrictions to use by non-academics: Please contact authors for commercial use.
Dependencies: bnx_stats.pl, cmap_stats.pl and xmap_stats.pl have no dependencies.
The JMS, MCC, NH, NL, and SJB declare that they have no competing interests. ETL, PS and TA are employees at BioNano Genomics and hold stock options.
MCC isolated the high molecular weight DNA and generated the image files on the Irys. ETL and JMS developed the assembly workflow. JMS wrote most of the code in the IrysScaffolding Github Repo (Stitch, AssembleIrysXeonPhi, AssembleIrysCluster, etc.). NH assisted with initial code review of analyze_irys_output (precursor to Stitch) and prepared Tcas5.0. JMS and NL manually edited Tcas5.1. JMS performed the data analyses. TA contributed to sections discussing BioNano RefAligner and Assembler. PS contributed to interpretation of results. JMS and SJB did most of the writing with contributions from all authors. All authors read and approved the final manuscript.
This project was supported by an Institutional Development Award (IDeA) from the National Institute of General Medical Sciences of the National Institutes of Health under grant number P20 GM103418. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of General Medical Sciences or the National Institutes of Health.
Data for Additional file 1 and Additional file 6 was kindly made available by P.A. Larsen, J. Rogers, A.D. Yoder and the Duke Lemur Center.
The Tribolium castaneum genome project is part of the i5k Genome Sequencing Initiative for Insects and Other Arthropods.
Matthias Weissensteiner & Jochen Wolf, Uppsala University. Stephen Schaeffer from The Pennsylvania State University and Stephen Richards from the Baylor College of Medicine Human Genome Sequencing Center for the use of the D. pseudoobscura data. Mike Kanost from Kansas State University. Jeff Maughan from Brigham Young University for the use of the Amaranth data. The Udall Lab from Brigham Young University and Cotton Inc. for the use of the cotton data. Grant (NSF 1237993) for use of the Medicago data. Christopher Cunningham, University of Georgia for the use of Nicrophorus data. Catherine Peichel from the Fred Hutchinson Cancer Research Center and Michael White from the University of Georgia for the Gasterosteus data. Mirkó Palla, Ph.D., Wyss Institute Postdoctoral Fellow, Church; Laboratory - Department of Genetics, Harvard Medical School and George Church, Ph.D., Wyss Institute Core Faculty Member, Robert Winthrop Professor of Genetics at Harvard Medical School, Professor of Health Sciences and Technology at Harvard and MIT, and Senior Associate Member at the Broad Institute of Harvard and MIT for the Escherichia coli data.
Minimum molecule map length (kb) | Molecule map N50 (kb) | Cumulative length (Mb) | Number of molecule maps |
---|---|---|---|
100 | 165.35 | 82,738.71 | 503,414 |
150 | 202.64 | 50,579.12 | 239,558 |
180 | 232.57 | 34,287.15 | 139,949 |
N50 (Mb) | Number | Cumulative Length (Mb) | |
---|---|---|---|
Tcas5.0 sequence scaffolds | 1.16 | 2240 | 160.74 |
Tcas5.0 in silico maps | 1.20 | 223 | 152.53 |
Consensus genome maps | 1.35 | 216 | 200.47 |
Tcas5.1 sequence scaffolds | 3.85 | 2148 | 165.72 |
Tcas5.2 sequence scaffolds | 4.46 | 2150 | 165.92 |
Tcas BioNano hybrid scaffolds | 1.83 | 2210 | 175.54 |
Breadth of alignment coverage (Mb) | Length of total alignment (Mb) | Percent of CMAP aligned | |
---|---|---|---|
Tcas5.0 in silico maps | 124.04 | 132.40 | 81 |
Consensus genome maps | 131.64 | 132.34 | 67 |
ChLG | Tcas5.0 scaffolds | Unplaced scaffolds added in Tcas5.2 | Tcas5.2 super scaffolds |
---|---|---|---|
X | 13 | +2 | 2 |
2 | 18 | +1 | 10 |
3 | 29 | +4 | 20 |
4 | 6 | +2 | 2 |
5 | 17 | +1 | 4 |
6 | 12 | +6 | 6 |
7 | 15 | - | 6 |
8 | 14 | +1 | 8 |
9 | 21 | - | 9 |
10 | 12 | +2 | 10 |
Total | 157 | 19 | 77 |
Bases per pixel (bpp) is plotted for scans 1..\(n\) for each flowcell of mouse lemur molecules (purple). The first scan of each flowcell is indicated with a grey dashed line. The pre-adjusted molecule map stretch was determined by aligning molecule maps to the in silico maps. Data made available by P.A. Larsen, J. Rogers, A.D. Yoder and the Duke Lemur Center.
Detailed metrics for molecule maps per BNX file (cumulative length and number of maps). Columns include cumulative length of molecule maps \(>\) 150 kb, number of molecule maps \(>\) 150 kb and date that BNX file was generated.
Detailed metrics for molecule maps including map N50, cumulative length and number of maps. Figures show histograms of per molecule map quality metrics including length, molecule map SNR and intensity, label count, label SNR and label intensity. Molecule maps are filter for minimum molecule lengths of 100, 150 or 180 kb.
Detailed assembly metrics for assembled consensus genome maps using strict, default and relaxed “-T” parameter, p-value threshold are named Relaxed-T, Default-T and Strict-T respectively. The best “-T” parameter was used for two additional assemblies with either relaxed minimum molecule map length (relaxed-minlen) of 100 kb, rather than the 150 kb default, or a strict minimum molecule map length (strict-minlen) of 180 kb.
Alignments of Tcas5.0 and Tcas5.2 in silico maps to consensus genome maps for all ChLGs. Consensus genome maps (blue with molecule coverage shown in dark blue) aligned to the in silico maps (green with contigs overlaid as translucent colored squares). Alignment to both Tcas5.2 super scaffolds (top alignment) and Tcas5.0 scaffolds (bottom alignment) are shown.
We examined experiments from 16 different genera to determine if the results seen for the Tribolium castaneum genome are typical for other genomes as well.