Jenna M. Lang edited Library Preparation and Sequencing .md  almost 10 years ago

Commit id: 5bdb93bf6c6c624235c24672561177c13cc1eaa6

deletions | additions      

       

When growing bacteria in culture, as described in this workflow, it should almost always be possible to get enough DNA to use PCR-free TruSeq and therefore minimize library preparation biases in the genome assembly.  ##Considerations in Library Preparation  Insert size: The tradeoff with insert size is between utility for assembly (larger is better) and ability of those fragments to amplify on the Illumina flowcell for sequencing (smaller is better). The optimal fragment size also depends on the length of reads used (with longer read-lengths, longer insert sizes are required for scaffolding). The final consideration is the amount of DNA available for sequencing. While having all inserts be exactly 750bp would might  be ideal, such a stringent size-selection would could  result in the recovery of only a very small amount of DNA. In our lab, with paired end 300bp (PE300bp) reads on the Illumina MiSeq, we shoot for an insert size range of 600-900bp. Different sequencing facilities have different opinions on this topic and it is worth having a discussion with your sequencing facility's point of contact before making any  libraries. ##Multiplexing  The capacity of an Ilumina MiSeq with PE300bp reads is around 15 Gb which would result in a coverage of 4300X for a typical bacterium with a 3.5Mb genome. On the HiSeq with PE125bp reads reads,  this would be over 14,000X coverage. Typically Currently,  the recommended coverage for a bacterial genome assembly is 30-100X depending on the choice of assembler. Therefore, sequencing a single bacterial genome on a full MiSeq or HiSeq  run is a significant waste of money and reagents. Furthermore many assemblers Furthermore, current genome assembly algorithms  do worse with too much not perform well given an excess of  data, requiring downsampling. and require down-sampling (i.e., throwing away data) to acheive the recommended coverage for assembly.  We typically multiplex 10-20 genomes on a PE300bp MiSeq run and many more on a HiSeq run. If using a kit for library prep, multiplexing is quite straightforward since there are a number of barcoded adaptors that come with the kit. Demultiplexing can be performed by the sequencing facility. ##Collaborations  Given the overcapacity of Illumina sequencing for bacterial genomes, doing a single genome presents a problem (unless willing to pay the ~$2000 total cost and throw away most of the data). Sequencing facilities will typically not "pool" samples from multiple groups because they don't want to oversee the pooling or deal with the associated billing hassles. In this case, collaborating with other groups would be the most logical option. Many labs sequence genomes or metageomes on a regular basis, adding in one additional sample isn't very difficult.