Jenna M. Lang edited Data Submission.md  almost 10 years ago

Commit id: 0001f952656a8126a8296d40f083c407b6b7323c

deletions | additions      

       

#Data Submission  This section describes how to submit contigs and scaffolds (if applicable) as a Whole Genome Shotgun (WGS) submission to Genbank. We also recommend allowing NCBI to annotate the genome themselves, since submitting RAST annotations to Genbank can be prohibitively complicated. The genomes are automatically shared with the DNA Data Bank of Japan (DDBJ) and the European Molecular Biology Laboratory (EBML). In addition, genomes from Genbank are automatically pulled into the Integrated Microbial Genomes (IMG) database hosted at the Joint Genome Institute (JGI), and are annotated there as well. This section also describes how to submit the raw reads, in this case we use the European Nucleotide Archive (ENA) for ease of use but the reads will be automatically incorporated into the Short Read Archive (SRA) at NCBI as well.  Genbank submission requires a .sqn file containing the contigs and an .agp file describing the scaffolds (if applicable). A5 outputs a .fasta file of scaffolds which can be converted Before submitting your genome, you will need  to a .fsa and a .agp file through a command line script included in the A5 program package. The .fsa file, along with a .sbt template file (created on the NCBI website) can then be converted to a .sqn file via a script have  availablethrough NCBI.  Create a BioProject at NCBI  Go to:  http://www.ncbi.nlm.nih.gov ##FASTA2AGP  To finish this submission you'll will need to obtain additional files as described below. files as described below  In the terminal, navigate to the directory containing your scaffolds file  Create an account or login Run the fasta2agp.pl script included  with Google or NIH login A5 on the scaffold file outputted from the A5 assembly "my\_scaffolds.fasta".   Syntax is:  Create a BioProject at NCBI:  Go to:  https://submit.ncbi.nlm.nih.gov/subs/bioproject/  Click on New submission perl fasta2agp.pl my_scaffolds.fasta > my_scaffolds.agp  eg:   perl /Users/Madison/Desktop/a5_miseq_macOS_20140113/bin/fasta2agp.pl /Users/Madison/Desktop/a5_miseq_macOS_20140113/example/phiX.a5.final.scaffolds.fasta > phiX.a5.scaffolds.agp  Submitter-fill If this runs successfully then you should see a both the .fsa and .agp files in your current directory.  Important Note: NCBI considers a gap of less than 10 nucleotides to be "missing information" in a contig, not a gap between contigs (whereas A5 has no minimum gap size). Therefore NCBI requires that contigs separated by less than 10 nucleotides be merged. This script performs that merging, meaning that the number of contigs in the .fsa file may be less than  in your input file. Therefore we recommend counting the contigs in the .fsa file:  To count them in the terminal use the syntax  grep -c “>” name_of_your_.fsa_file  Important Note: If after running the fasta2agp.pl script and counting the contigs you have the same number of contigs as starting scaffolds, then you should only submit the .sqn file to Genbank and say that scaffolding did not take place (otherwise NCBI will reject the .agp file). First, navigate to http://www.ncbi.nlm.nih.gov. Create an account and/or login. Then, create a BioProject at NCBI by navigating to https://submit.ncbi.nlm.nih.gov/subs/bioproject/ and clicking on "New submission." Fill in the  personal information (information for the submitter.  Below,  in italics are the responses that we typically give for a genome sequencing project) project.  + Project type  + Project data type-_genome sequencing_  

+ Biosample-_blank_  + Publications-_blank_  Once the project is submitted, refresh the page and copy down the Bioproject ID (starts (it starts  with "PRJNA")##FASTA2AGP  To finish this submission you'll need the files as described below  In the terminal, navigate to the directory containing your scaffolds file  Run the fasta2agp.pl script included with A5 on the scaffold file outputted from the A5 assembly "my\_scaffolds.fasta".   Syntax is:   perl fasta2agp.pl my_scaffolds.fasta > my_scaffolds.agp  eg:   perl /Users/Madison/Desktop/a5_miseq_macOS_20140113/bin/fasta2agp.pl /Users/Madison/Desktop/a5_miseq_macOS_20140113/example/phiX.a5.final.scaffolds.fasta > phiX.a5.scaffolds.agp   If this runs successfully then you should see a both the .fsa and .agp files in your current directory.  Important Note: NCBI considers a gap of less than 10 nucleotides to be "missing information" in a contig, not a gap between contigs (whereas A5 has no minimum gap size). Therefore NCBI requires that contigs separated by less than 10 nucleotides be merged. This script performs that merging, meaning that the number of contigs in the .fsa file may be less than in your input file. Therefore we recommend counting the contigs in the .fsa file:  To count them in the terminal use the syntax  grep -c “>” name_of_your_.fsa_file  Important Note: If after running the fasta2agp.pl script and counting the contigs you have the same number of contigs as starting scaffolds, then you should only submit the .sqn file to Genbank and say that scaffolding did not take place (otherwise NCBI will reject the .agp file).  ##Create a .sbt template  Create a .sbt template file at NCBI