Ino de Bruijn Init pdf of report  over 9 years ago

Commit id: 7b906f8e9e3df89cda918182ad016493e481e4bf

deletions | additions      

      Binary files /dev/null and b/plos/plos_template.pdf differ        

% Version 2.0 July 2014  %  % To compile to pdf, run:  % latex plos.template plos_template.tex  % bibtex plos.template plos_template.tex  % latex plos.template plos_template.tex  % latex plos.template plos_template.tex  % dvipdf plos.template plos_template.tex  %  % % % % % % % % % % % % % % % % % % % % % %  % 

%\usepackage{setspace}   %\doublespacing  %TODO: remove for production   \usepackage{graphicx}  % Text layout  \topmargin 0.0cm 

% Title must be 150 characters or less  \begin{flushleft}  {\Large  \textbf{Metagenomic Assembly Validation of an {\em  in vitro mock community} vitro} Mock Community}  }  % Insert Author names, affiliations and corresponding author email.  \\  Autho1$^{1}$,   Author2$^{2}$,   Author3$^{3,\ast}$ Ino de Bruijn$^{1,2,\ast}$,   Johannes Alneberg$^{1}$,   Linda d'Amore,   Neil Hall,   Umer Z. Ijaz$^{3}$,   Christopher Quince$^{3}$,   Anders F. Andersson$^{1}$  \\  \bf{1} Author1 Dept/Program/Center, Institution Name, City, State, Country KTH Royal Institute of Technology, Science for Life Laboratoy, School of   Biotechnology, Division of Gene Technology, Stockholm, Sweden  \\  \bf{2} Author2 Dept/Program/Center, Institution Name, City, State, Country BILS Bioinformatics Infrastructure for Life Sciences, Stockholm, Sweden  \\  \bf{3} Author3 Dept/Program/Center, Institution Name, City, State, Country University of Glasgow, Glasgow, UK  \\  $\ast$ E-mail: Corresponding [email protected] author [email protected]  \end{flushleft}  % Please keep the abstract between 250 and 300 words  \section*{Abstract}  Single genome assembly algorithms have been benchmarked with real   sequencing data in the assembly challenges Assemblethon and GAGE. The {\em de   novo} metagenomic assembly algorithms have so far only been evaluated using   similated reads. In this paper we present a benchmark using an {\em in vitro}   mock community of 52 species with known reference genomes. The mock community   was configured in two different abundance configurations: an even distribution   and a log-normal distribution similar to distributions of phyla in soil. The   communities were sequenced with Illumina HiSeq paired end mode. The data is   openly available for other researchers to experiment on. Here, the reads have   been used to test various assembly recipes i.e. a combination of Velvet,   Meta-Velvet, Ray, Minimus2, Newbler and Bambus2 resulting in a total of   twenty-one different assembly recipes. The assemblies are assessed on coverage of   the reference genomes and the purity per contig. Purity is a ratio based on   the best alignment per contig as determined with MUMmer. We show that there are   many impure contigs constructed, both for the even community and the log-normal   community. There is a clear tradeoff between contig length and contig purity.   Velvet performs best in terms of purity and coverage of the references, while   Velvet or Ray followed by a kmer merging step with Minimus2 or Newbler gives   the longest contigs covering the references with a minor decrease in purity. We   furthermore show that a simple rule of thumb for obtaining pure contigs is   selecting those with high coverage.   %TODO: get right number of words, include references  % Please keep the Author Summary between 150 and 200 words  % Use first person. PLOS ONE authors please skip this step.  

\section*{Introduction}  Metagenomics, the sequencing of environmental DNA, has demonstrated to be a   promising approach for the discovery and investigation of microbes that cannot   be cultured in the laboratory \cite{Eisen17355177} as well as for the study of   both free-living microbial communities \cite{Andersson18497291} and microbial   communities inside other organisms \cite{Qin20203603,Hess21273488}.\\       In a typical shotgun metagenomics experiment the DNA of a community is isolated   and high throughput sequencing is performed on a random sample of the isolated   DNA \cite{Morgan20419134}. The reads can either be analyzed as such, by e.g.   blast searches against reference databases to obtain a functional profile of   the microbial community \cite{Tringe15845853}, or they can be assembled to form   longer stretches of DNA stemming from the same or closely related organisms   that can subsequently be analyzed with regards to phylogenetic affiliation and   functional properties. The output of the assembly process often includes   scaffolds, contigs and unassembled reads \cite{Mavromatis17468765}. One of the   problems with assembling is that chimeric contigs or scaffolds may be formed.   Closely related sequences are more likely to form chimeras and since closely   related strains often occur in the same environment this is a challenge. Also,   it is difficult to determine whether the formation of a chimera is natural due   to homologous recombination or an error in the assembly process   \cite{Tyson14961025}. Another problem with assembly is variations in gene   content among closely related strains, since a gene inserted in a subpopulaton   will cause conflicting assembly results \cite{Hallam17114289}. After assembling   the reads, a process called binning is performed, where the resulting scaffolds   and contigs are assigned to phylogenetically related groups. Finally, gene   calling and functional annotations are performed on the scaffolds.\\     % rewrite upper part (maybe take some parts from theoretical background)     In our studies several recipes for {\em de novo} assembly of metagenomic   data have been evaluated. In an {\em in silico} performed comparison between   Illumina, Sanger and 454 on cost of sequencing and resulting coverage of   microbial communities, Illumina short read libraries were shown to be the best   for communities of medium complexity \cite{Mende22384016}. Therefore we have   chosen to assess the assembly recipes for Illumina paired short reads   sepecifically. In previous studies mostly {\em in silico} metagenomic data   sets have been used \cite{Pignatelli21625384,Mavromatis17468765}. In contrast   the community of our study is an {\em in vitro} simulated metagenome consisting   of 52 species with completed or nearly completed genomes so the quality of our   assesment is not dependent on the realisticness of read simulators. An even and   uneven distribution of the 52 species were created {\em in vitro}. The community has   been sequenced with different type of library preparations to be able to test   the difference in library preparation as well. The following assembly programs   have been tested: Velvet \cite{Zerbino18349386}, Meta-Velvet \cite{MetaVelvet},   Newbler \cite{Quinn18755037}, Minimus2 \cite{Sommer17324286}, Ray Meta   \cite{Boisvert23259615} and Bambus2 \cite{Koren21926123}. The quality of the   assemblies have been evaluated by mapping the constructed contigs or scaffolds   to the collection of reference genomes, hereafter referred to as the reference   metagenome. In addition two pipelines have been constructed, one to perform the   assemblies and another to perform the validation given there is a reference   metagenome available.\\     % related work     In a study by \cite{Mavromatis17468765} three genome assemblers were   evaluated: Phrap \cite{delaBastide18428783}, Arachne \cite{Batzoglou11779843}   and Jazz \cite{Aparicio12142439}. For the evaluation three artificial   communities were constructed of low, medium and high complexity by selecting   Sanger reads from 113 isolate genomes. The low complexity community had one   dominating population with several low-abundance ones, the medium more than one   dominating population and the complex community had no dominating population at   all. Resulting contigs were evaluated on chimericity and length distribution.   Compared to using the original reads for gene annotation, assembly was   demonstrated to give up to 20\% increase in accurate gene prediction   and a slightly better increase for inaccurate and missed genes. Sanger reads of   700 bp were used. This approach of using artificial communities has   subsequently been used in adapted versions by several other assembly evaluation   papers \cite{Pignatelli21625384,Mende22384016}. In the benchmark by   \cite{Pignatelli21625384} the reads of the artificial communities were changed   from Sanger to 454 and Illumina. For the Illumina reads, SSAKE   \cite{Warren17158514} and Velvet were used to perform the assembly. No   difference in chimericity between using the simulated 454 reads or the Illumina   reads was spotted. The main cause of chimericity was sequence similarity of the   organisms, no relation with genome coverage was found. At the functional level   metagenomic assembly turned out to be counterproductive compared to using the   original reads for annotation. \cite{Mende22384016} used a metagenome of 10,   100 and 400 species with simulated reads of Illumina, 454 and Sanger where the   number of reads for each technology was based on sequencing cost. The   sequencing cost was kept constant. All of the technologies provided similar   coverage for 10 species. Illumina was superior for 100 species due to the   higher coverage one can get for a similar price. Sanger performed best for 400   species because of longer read length. Sanger reads were assembled with   Arachne; 454 reads with Celera \cite{Myers10731133} and Illumina reads with   SOAPdenovo \cite{Li20019144}. Similar to the study of   \cite{Pignatelli21625384} a year earlier, the authors concluded that assembly   contigs improves functional annotation of the metagenome. Furthermore using   Illumina paired end data to determine contig links and construct scaffolds,   although introducing more chimerism, resulted in an even better functional   annotation. Beyond using simulated reads or real reads of {\em in silico}   communities there has not been a comparison of assembly algorithms using an   {\em in vitro} community yet. In vitro communities have been used previously   with success to assess DNA extraction techniques for sequencing a low   %TODO find number of genomes   complexity community of nine bacterial genera \cite{Willner22514642}, an oral   community \cite{Diaz22520388}, the human gut \cite{Wu20673359} and the human   microbiome \cite{HMPC22699610}. The advantage of using an {\em in vitro}   community for assembly evaluation is that one does not have to rely on the   correctness of sequencing simulators, the assessment can thus be as good as the   similarity of the {\em in vitro } community to a real community.     %Say something about GAGE and Assemblathon  % You may title this section "Methods" or "Models".   % "Models" is not a valid title for PLoS ONE authors. However, PLoS ONE  % authors may use "Analysis"   \section*{Materials and Methods}  To determine the quality of metagenomic assembly a mock   community of species with known genomes was constructed {\em in vitro} and   sequenced with Illumina. The resulting reads have been assembled using a   combination of Velvet, Meta-Velvet, Ray, Minimus2, Newbler and Bambus2 resulting in   nineteen different assembly recipes (see Figure \ref{fig:asmstrat} and Table   \ref{tab:asmstrat}). The recipes stem from current literature and our own   ideas.     \subsection*{Mock community} The sequenced mock community consisted of 59   species. The species have been chosen such that there are a number of closely   related organisms and more distant ones. The number of species is about equal   to the number of species one would find in the human gut. The abundances of DNA   from each species have been fixed in two types of configurations before   sequencing. In the first configuration, the even configuration, all species   have approximately equal genome copy numbers. In the second configuration, the   uneven configuration, the phyla are mixed in proportions similar to log-normal   distributions of phyla in soil \cite{Doroghazi18682841}. The samples have been   prepared with the Nextera 1ng sample preparation kit. The entire reference   metagenome's size is about 200Mb. Mock community preparations and sequencing   were performed by our collaborators, Christopher Quince at University of   Glasgow and Linda D'Amore and Neil Hall at Liverpool's Centre for Genomics   Research. Sequencing of the even and uneven community resulted in about 7,9Gb   and 6.7Gb respectively.     \subsection*{Quality trimming} Before assembling the reads one often starts with   pre-processing them by quality trimming and/or removing PCR duplicates.   \cite{Mende22384016} demonstrated that quality trimming could drastically   improve the assembly. Before each assembly the same quality trimming procedure   has been performed. For quality trimming the program sickle was used (see Table   \ref{tab:programversions}). Reads were trimmed from the 3' end if the average   quality score was below 20 in a window of 10 bases. If the resulting read is   shorter than 20 it is discarded. Only pairs are used in the subsequent   assembly, not the single reads.     \subsection*{Reference genome filtering} Some of the reference genomes were not   similar enough to the genomes in the mock community for a fair comparison. We   therefore selected only those references that had at least 90\% of the genome   covered by pairs stemming from the community with even abundances per genome.   The quality trimmed pairs that did not align properly against this subset of 52   references were discarded. The references and their GID can be found in   Supplementary Table S1. After filtering the pairs there were 3,8Gb and 3,1Gb   left for the even and uneven community respectively.     % The V3 (192 bp) and V4 (291 bp) of the 16S genes have been amplified and the samples have been sequenced with Illumina.       \subsection*{Assembly} In the assembly procedure reads are combined into   contiguous sequences called contigs. Contigs can afterwards be joined using   paired read information into longer scaffolds. In the scaffolding process   contigs might be extended and repeats might be solved so scaffolding is not   restricted to just the ordering of contigs.\\       There are a plethora of different assemblers available and by pre-processing   reads and combining different assemblers an even larger amount of assembly   recipies is possible. Velvet is one of the most used assembly programs and   was therefore included in this assessment. Velvet's metagenomic counterpart,   Meta-Velvet, is performed after executing Velvet so it is possible to determine   how the metagenomic specific parameters improve the assembly. Another popular   assembler for metagenomics is Ray \cite{Boisvert23259615}. Ray is based on MPI   and is runnable over multiple nodes distributing both memory and processor   load, which makes it an ideal candidate for large metagenomic projects.\\       \subsection*{Contiging}   Velvet, Ray and Meta-Velvet all use a de Bruijn graph to determine overlaps   between reads. This involves cutting up the reads in sizes of a specified kmer   size and let edges represent overlaps between kmers i.e. ($k+1$)mers. This way   the graph, or the computational requirements, grow with the number of unique   kmers in the library instead of the number of reads. For a more elaborate   description of de Bruijn Graphs for sequence assembly see   \cite{Miller20211242}. The resulting contigs are constructed by following paths   in the graph. The paths that can be unambiguously followed are called unitigs.   Ambiguous paths can be solved by using coverage information or paired-end   information. Contigs thus consist of one or multiple unitigs. Choosing the   right kmer size is important. A shorter $k$ gives more connectivity within the   graph and hence requires lower sequencing coverage of the genomes, but at the   same time the risk increases that a kmer occurs multiple times within a genome,   or in multiple genomes (hence ambiguous paths will exist). A larger $k$ can   overcome this problem if it is larger than the multiply occurring region. But a   larger $k$ also requires higher sequence coverage.\\       %\subsubsection*{How the assemblers differ}   Velvet, Ray and Meta-Velvet differ in the way the graph is traversed. Velvet,   meant for single genomes, looks for one coverage peak in the coverage   distribution and tries to follow that, where the main idea is that the genome   is approximately uniformly covered. Nodes in the graph below a certain coverage   threshold are considered errors and ones with high coverage repeats.   Meta-Velvet looks for multiple peaks in the coverage distribution. The contigs   of each genome should have a distinct coverage peak due to the genome copy   number of the corresponding genome being different from the other genomes in   the metagenome. Meta-Velvet makes use of that property. Ray looks for 'seeds'   in the graph and extends those seeds iteratively weighting choises by the   number of reads supporting a certain path. The seeds are unitigs in the graph   with a specific coverage. The metagenomic update to Ray changes the seed   selection by looking at the coverage peak in the graph locally instead of   globally.       %\subsection*{Merging}   A way to get the advantage from both short and long kmers is by merging contigs   generated in multiple assemblies with different kmer lengths. This is possible   with Newbler, as done by \cite{Luo22347999}, or with Minimus2, as done by for   instance the Rnnotator pipeline \cite{Martin21106091}. Both Newbler and   Minimus2 use an Overlap-Layout-Consensus method to merge contigs   \cite{Sommer17324286,Miller20211242}.       %\subsection*{Scaffolding}   For the scaffolding procedure Bambus2 was chosen since it was one of the better   scaffolders for single genomes in the GAGE assessment paper   \cite{Salzberg22147368} and is suitable for metagenomes as well   \cite{Koren21926123}. For a flow diagram of previously mentioned approaches see   Figure \ref{fig:asmstrat}. A total of twenty-one assembly recipies from the flow   diagram have been tested. See Table \ref{tab:asmstrat} for an overview of the   assembly recipies, Table \ref{tab:programversions} for versions of each   program and Table \ref{tab:asmstratparameters} for the parameters of each   recipe.     %\clearpage   %\thispagestyle{empty}   %\begin{figure}[ht!]   % \centering   % \includegraphics[height=\textheight]{figures/metassemble-flowchart.pdf}   % \caption{Assembly recipies using a combination of Velvet, Meta-Velvet, Ray, Minimus2, Newbler and Bambus2.}   % \label{fig:asmstrat}   %\end{figure}     %TODO Validation requires some more non-ambiguous parameters for calculating   % the statistics and performing the mapping with MUMmer     \subsection*{Validation} \label{sec:metval} The validation of a metagenomic assembly in   case a reference metagenome is available often focuses on one or more of the   following points:   \begin{itemize}   \item contig or scaffold length distribution   \item contig/scaffold coverage of the reference metagenome   \item chimericity of the contigs/scaffolds   \item functional annotation accuracy   \item phylogenetic classification accuracy   \end{itemize}   This study focusses on the first three points, since those are expected to   improve the functional annotation and the phylogenetic classification.     \subsubsection*{Aligning the assembly against the reference metagenome} For   determining how well the assemblies matched the reference metagenome the   assemblies were mapped against the reference metagenome using MUMmer 3.1   \cite{Kurtz14759262}. MUMmer finds maximal exact matches longer than $l$ and   clusters them if they are no more than $g$ nucleotides apart. The alignments   are afterwards extended for each cluster if the combined length of its matches   is at least $c$. The alignments are extended in between the matches of the   cluster and on the ends using a Smith-Waterman dynamic programming algorithm.   The MUMmer package contains multiple scripts that make use of this approach.   NUCmer (\underline{NUC}leotide MUM\underline{mer}) is a script included in the   MUMmer package for DNA sequence alignment of a set of query contigs against a   set of reference contigs. The command for NUCmer used was: {\em nucmer   --maxmatch -c65 -g90 -l20}. The {\em maxmatch} parameter makes sure all exact matches   are used, whether they are unique or not, so contigs that consist only of a   shared region or a repetitive element will be included in the alignments as   well. Afterwards the script {\em show-coords} was used on the resulting   alignment file to extract information about each alignment such as its location   in both the query and the reference, percent identity, percent similarity and   percent of the reference and query covered. We define the purity of an   alignment by multiply the query coverage with the identity of the alignment. The {\em purity} of   a contig is defined as its purest alignment. An impure contig can be the result   of a rearrangement, an indel, copy number variation, inclusion of a kmer   stemming from another genome or inclusion of a kmer that is a sequencing   error.\\  % Results and Discussion can be combined.  \section*{Results}  In Table ?? the length statistics of the various assemblies are shown. We chose   to show only assemblies with a kmer of 31 to keep the information consise. The   merged recipies are based on combining kmers from 19 up to 75 with a stepsize   of 2.  % We only support three levels of headings, please do not create a heading level below \subsubsection.  \subsection*{Subsection 1}     \subsubsection*{SubSubsection 1.1}     \subsection*{Subsection 2}     \section*{Discussion} \subsection*{Purity}   Figure ?? shows the number of bases in contigs over different purity intervals   and contig length intervals for the even community. In terms of delivering the   least amount of impure contigs, velvetnoscaf does best. It however does not   deliver very long contigs, raynoscaf does better at a cost of outputting more   impure contigs. The metavelvetnoscaf recipe provides even more long contigs   but also an even larger amount of impure contigs compared to the other two   noscaf recipies. It becomes clear that one has to make a choice between   length and purity when assembling by following one of these recipies. All the   scaf recipies result in a large increase in the number of impure contigs. For   the merging recipies with minimus2 and newbler there is very little   difference between the two. In both cases there is an increase in contig   lengths with a decrease in purity, but not as much as for the scaf recipies.     \subsection*{Metagenome coverage}   The metagenome coverage of the different recipies for the even community can   be seen in Figure ??. The light lines are computed using only completely pure   contigs, the dark lines using the purest alignment of every contig. This gives   an idea of the range of the metagenome coverage when using different cut off   values for purity. The merge recipies do the best job of increasing contig   lengths and coverage of the metagenome. Again there are only minor differences   between minimus2 and newbler. Newbler results in slightly purer contigs. If we   would only look at the light lines i.e. counting only completely pure contigs   then it would seem the merging recipe is rather bad. Therefore in Figure ??   we plotted several different purity cutoffs for minimusvelvetnoscaf. The plot   proves that most of the metagenome coverage is coming from only slightly impure   contigs.     \subsection*{Kmer LCA analysis}   The impurity of contig could come from rearrangements, including chimeric kmers   and/or unknown kmers. An unknown kmer might come from an error in the   sequencing or because the input DNA was slightly different from the reference.   We refer to these kmers henceforth as erroneous kmers. In Figure ?? one can see   that most of the chimeric kmers come from genomes whose LCA is either at the   species or genus level. The sum of the chimeric kmers is larger than the number   of kmers not stemming from any of the reference genomes. For velvetnoscaf31   contigs with an erroneous and chimeric kmer are occuring in an approximately   equal ratio. There are however more than double as many chimeric kmers   indicating that a chimeric contig often has more chimeric kmers than an   erroneous contig has erroneous kmers.     \subsection*{Extracting pure contigs}   There are a plethora of ways one can postprocess a metagenomic assembly. Now   that we have demonstrated there is quite some impurity in metagenomic   assemblies, especially for assemblers outputting longer contigs, it would be   ideal to get a confidence score per contig that reflects its purity without a   reference genome. Depending on the postprocessing desired a confidence   threshold can be chosen to only include certain contigs. We ran FRCbam and   REAPR on the raynoscaf31 assembly. Unfortunately for both reference less   validation tools we could not find a set of error indicators that would be an   indication of impurity i.e. chimericity, indels, erroneousness or   rearrangements. A very simple rule of thumb is to simply use contigs coverage   as an indication of contig purity. In Figure ?? one can see that the pure bases   are mostly in contigs with a high coverage mean. Figure ?? shows the relation   between coverage mean and purity. 

% style file and paste the contents of your .bbl file  % here.  %   \bibliography{plos_template}  \section*{Figure Legends}  % This section is for figure legends only, do not include 

%}  %\label{Figure_label}  %\end{figure}  %\begin{figure}   %\caption{   %{\bf Figure 1. Distribution of bases in contigs over purity and length   %intervals for the mock community with even abundances per genome.} Three   %different assembly recipes are shown: velvetnoscaf31 (A), raynoscaf31 (B) and   %metavelvetnoscaf31 (C). The velvet recipe gives the purest contigs, but they   %are not very long. From the top panel to the bottom panel a trend can be   %noticed: more longer conitgs are produced at a cost of purity.   %\label{Figure_1}   %\end{figure}   \clearpage   \thispagestyle{empty}   \begin{figure}   \centering   \includegraphics[width=\textwidth]{figures/Figure1.eps}   \caption{   {\bf Figure 1. Distribution of bases in contigs over purity and length   intervals for the mock community with even abundances per genome.} Three   different assembly recipes are shown: velvetnoscaf31 (A), raynoscaf31 (B) and   metavelvetnoscaf31 (C). The velvet recipe gives the purest contigs, but they   are not very long. From the top panel to the bottom panel a trend can be   noticed: longer conitgs are produced at a cost of purity.}   \label{Figure_1}   \end{figure}     \begin{figure}   \centering   \includegraphics[width=\textwidth]{figures/Figure2.eps}   \caption{   {\bf Figure 2. Distribution of bases in contigs over purity and length   intervals for the mock community with even abundances per genome.} Three   different assembly recipes are shown: raynoscaf31 (A), raynoscafminimus2 (B)   and raynoscafnewbler (C). Merging Ray assemblies over kmers 19 to 75 with a   stepsize of 2 using Minimus2 and Newbler results in longer but impurer contigs.   The Newbler recipe is more stringent than Minimus2.}   \label{Figure_2}   \end{figure}     \begin{figure}   \centering   \includegraphics[width=0.5\textwidth]{figures/Figure_3.eps}   \caption{   {\bf Figure 3. LCA for each kmer that did not belong to the reference genome.}   Three different assembly recipes are shown: velvetnoscaf31 (A), raynoscaf31 (B)   and raynoscafnewbler (C). }   \label{Figure_3}   \end{figure}     \begin{figure}   \centering   \includegraphics[width=0.5\textwidth]{figures/Figure_4.eps}   \caption{   {\bf Figure 4. Type of impure contigs based on Kraken analysis}   Three different assembly recipes are shown: velvetnoscaf31 (A), raynoscaf31 (B)   and raynoscafnewbler (C).}   \label{Figure_3}   \end{figure}  \section*{Tables} 

%\end{flushleft}  %\label{tab:label}  % \end{table}  \begin{table}[h!]   \centering   \begin{tabular}{|l|c|c|c|}   \hline   Assembly recipe name & Contiging & Merging & Scaffolding\\   \hline   velvetnoscaf & Velvet & - & -\\   velvetscaf & Velvet & - & Velvet\\   velvetnoscafminimus2 & Velvet & Minimus2 & -\\   velvetnoscafnewbler & Velvet & Newbler & -\\   velvetnoscafbambus2 & Velvet & - & Bambus2\\   velvetnoscafminimus2bambus2 & Velvet & Minimus2 & Bambus2\\   velvetnoscafnewblerbambus2 & Velvet & Newbler & Bambus2\\   metavelvetnoscaf & Meta-Velvet & - & -\\   metavelvetscaf & Meta-Velvet & - & Meta-Velvet\\   metavelvetnoscafminimus2 & Meta-Velvet & Minimus2 & -\\   metavelvetnoscafnewbler & Meta-Velvet & Newbler & -\\   metavelvetnoscafbambus2 & Meta-Velvet & - & Bambus2\\   metavelvetnoscafminimus2bambus2 & Meta-Velvet & Minimus2 & Bambus2\\   metavelvetnoscafnewblerbambus2 & Meta-Velvet & Newbler & Bambus2\\   raynoscaf & Ray & - & -\\   rayscaf & Ray & - & Ray\\   raynoscafminimus2 & Ray & Minimus2 & -\\   raynoscafnewbler & Ray & Newbler & -\\   raynoscafbambus2 & Ray & - & Bambus2\\   raynoscafminimus2bambus2 & Ray & Minimus2 & Bambus2\\   raynoscafnewblerbambus2 & Ray & Newbler & Bambus2\\   \hline   \end{tabular}   \caption{Assembly recipies}   \label{tab:asmstrat}   \end{table}  \section*{Supporting Information Legends}  % 

%\item {\bf}  %\item {\bf}  %\end{description}  \clearpage   \thispagestyle{empty}   \begin{figure}   \centering   \includegraphics[width=\textwidth]{figures/Figure_S1.eps}   \caption{   {\bf Figure S1. Distribution of bases in contigs over purity and length   intervals for the mock community with even abundances per genome.}}   \label{Figure_S1}   \end{figure}     \clearpage   \thispagestyle{empty}   \begin{figure}   \centering   \includegraphics[width=\textwidth]{figures/Figure_S2.eps}   \caption{   {\bf Figure S2. Distribution of bases in contigs over purity and length   intervals for the mock community with even abundances per genome.}}   \label{Figure_S2}   \end{figure}  \end{document}