Authorea

Alberto Pepe edited Library Preparation and Sequencing .md about 9 years ago

Commit id: ba2b9be163daf6734e9bd98c6a9fa5859b4970bd

deletions | additions

As described above, current Illumina sequencing systems have much greater capacity than is needed for sequencing a single genome. This means it can be generally beneficial to combine many samples into a single run of a machine. Unfortunately, our experience has been that sequencing facilities will typically not help in the coordination of such pooling of samples (we assume because they do not want to oversee the pooling or deal with the associated accounting hassles). Therefore, it is typically up to the users to carry out such coordination. Though this can sometimes be complicated, it is generally worthwhile, since one can pool together many genomes or metagenomes into a single run of a system and still get enough data for each project, thus making the sequencing cost per project significantly lower. For this to work well, one needs to coordinate the use of barcodes to tag each sample, coordinate of the pooling, and have available the informatics required to "demultiplex" samples from each other. ##Downsampling Coverage (also known as read depth) is the average number of reads representing a given nucleotide. It is a function of the number and size of genomes pooled onto a run and the number and length of reads. The optimal amount of coverage depends on the read length, the assembler being used, and other factors. For Illumina data assembled using this workflow, we recommend that this number be between 20x and 200x. See our more detailed discussion in section 9.1.3 "Interpretation of A5-miseq stats". If you have coverage significantly higher than 200x and wish to downsample your data, we have written a script (sub\_sample\_reads) for this purpose. Downsampling should not be necessary if following the assembly instructions in this workflow. If downsampling, you will first need to calculate how many reads you want the script to sample. We recommend determining how many reads would be equivalent to 100x coverage (divide the genome size by the average read length and multiply by 100). You can download the script from the zipped script file found on Figshare \cite{9a5f8181-40cb-45b4-8f8c-d2abfe9c8cff}. \cite{47b41cbb-81bb-44cb-8430-1218ddad365c}. Create a new directory containing the script (sub\_sample\_reads) and the reads you wish to downsample. To downsample the data, navigate to the directory you just created (in the terminal) and use the following command