David Coil edited Data Submission.md  over 9 years ago

Commit id: f739db214fa53291e68d369653a056979bc26f4e

deletions | additions      

       

Potential problems with data submission:  Sometimes contigs that are submitted belong to contaminating organisms, or to the phiX that is often used in sequencing. If this is the case, you will recieve receive  an e-mail from NCBI telling you which contigs to remove. It's important to note that after removing contigs, you need to rename all of your remaining contigs so as to not be missing numbers in the sequence. Below is a simple command that renumbers the contigs in the cleaned file (the original file with the contaminated contigs removed) and saves them to a new file (test.fa is the name of your cleaned file and test2.fa is the name you want the renumbered file to have): cat test.fa | awk '{print (NR%2==1) ? ">contigs_" ++i : $0}' > test2.fa