Jenna M. Lang edited Data Submission.md  about 9 years ago

Commit id: 2388a34dcdebe5be35ae880ff5efd3999dfd8b02

deletions | additions      

       

If this runs successfully then you should see a both the .fsa and .agp files in your current directory.  Important Note: NCBI considers a gap of less than 10 nucleotides to be "missing information" in a contig, not a gap between contigs (whereas A5 has no minimum gap size). Therefore Therefore,  NCBI requires that contigs separated by less than 10 nucleotides be merged. This script performs that merging, meaning that the number of contigs in the FSA .fsa  file may be less than in your input file. Therefore Therefore,  we recommend counting the contigs in the FSA .fsa  file: To count them in the terminal use the syntax  grep -c “>” name_of_your_.fsa_file  Important Note: If after running the fasta2agp.pl script and counting the contigs you have the same number of contigs as starting scaffolds, then you submit only the contigs as described above. in Section 10.1.  **Create a SBT template**  Create a SBT template file at NCBI   [http://www.ncbi.nlm.nih.gov/WebSub/template.cgi](http://www.ncbi.nlm.nih.gov/WebSub/template.cgi)  The BioProject # is the Bioproject ID starting with "PRJNA" which you received above, above.  BioSample can be left blank blank.  When you click create "Create  the template, template",  it will automatically download to your computer as template.sbt. We recommend immediately renaming the file to the appropriate project. **Tbl2asn**  Download the tbl2asn program from   [ftp://ftp.ncbi.nih.gov/toolbox/ncbi\_tools/converters/by\_program/tbl2asn/](ftp://ftp.ncbi.nih.gov/toolbox/ncbi\_tools/converters/by\_program/tbl2asn/) [ftp://ftp.ncbi.nih.gov/toolbox/ncbi\_tools/converters/by\_program/tbl2asn/](ftp://ftp.ncbi.nih.gov/toolbox/ncbi\_tools/converters/by\_program/tbl2asn/).  If you are using Safari, a window will pop up asking for login information, just choose guest and unzip the version of the program that is compatible with your operating system. Other browsers will take you to a page with a lot of tbl2asn programs, download the one compatible with your operating system.  After downloading the desired command-line program, double click to  uncompress the archive and rename the resulting file to tbl2asn tbl2asn.  Now change the file permissions of the file (in the terminal) since transfer by FTP resets the permissions. Syntax is:  chmod 755 tbl2asn  Once you have changed the permissions, create a new directory and place tbl2asn along with the SBT .sbt  file and FSA .fsa  files into the folder. Run the tbl2asn program using the following syntax. You will need to fill out the organism name, strain, location, collection date, and  isolation source specific to your own project. path_to_program/tbl2asn -p path_to_files -t template_file_name   -M n -Z discrep -j "[organism=X] [strain=X] [country=X: city,   state abbreviation] [collection_date=X] [isolation-source=X] [gcode=11]"  Following the -p is the path to the directory containing the FSA .fsa  file, following the -t is the path to and name of the SBT .sbt  template file Sample syntax 

We recommend using Safari or Firefox for this step, in our hands Chrome can have issues with the Java requirements for uploading files.  Go to: to  [https://www.ebi.ac.uk/ena/about/sra_submissions](https://www.ebi.ac.uk/ena/about/sra_submissions) And and  create an account account.  Successful creation of an account should take you to the "Welcome to ENA's Sequence Read Archive (SRA) Webin submission system." screen system" screen.  Click on New Submission tab "New Submission" tab.  Select Submit "Submit  sequence reads and experiments experiments".  Click on Data "Data  Upload Instructions Instructions"  towards bottom of page page.  This takes you to a variety of options for uploading files depending on your preference and operating system. We use the Webin Data Uploader. Click on the link which will download a .jlnp file. Open and run this file. Depending on your system you may have to download and install a new version of Java. On some systems you may have to right-click the .jlnp file and Open open  with “Java Web Start”. Login using your e-mail email  address and password password.  In the WebinDataUploader, in the blank area to the right of the Local Upload directory, navigate to the directory on your computer containing the reads (using the path as you would in the terminal) terminal).  Select the file(s) containing the reads and click Upload. "Upload".  (Note that paired-end data is required to be in two separate fastq .fastq  files. If your data came as one interleaved file, then the separated fastq .fastq  files can be found in the directory where the A5 assembly was performed as [project name].raw1\_p1.fastq.gz and [project name].raw1\_p2.fastq.gz ) Note that the only acceptable file types for submission are gzip (.gz) and bzip (.bz2). To gzip files in the Terminal use the following syntax:  gzip [filename]   After completion, return to EMBL (the new submission "New Submission"  tab of the SRA Webin submission system) and select the Next "Next"  button. During this process, refreshing the page or navigating away from the page will reset the form and the information will be lost.Click Create a New Study. Fill in descriptions of the project and proceed to next tab. Select the appropriate metadata format, or in most cases the ENL default sample checklist at the bottom. Note that the default release date is three months from the current date, change this if the data should be released sooner.  You should now be at the Sample page. Required fields are listed on the right and optional additional fields can be selected from the options on the right. Fill out the appropriate fields and click on Next.  Note: If you are submitting data for an organism that doesn’t have Click "Create  a Taxon ID (“Tax ID”) then you need to e-mail ENA to receive one ([email protected]). If you have already submitted New Study". Fill in descriptions of  the genome project and proceed  to NCBI then you can retrieve next tab. Select  the Taxon ID from your BioProject page there.   On appropriate metadata format, or in most cases  the ENA page, you will be able to search for ENL default sample checklist at  the Taxon ID and find your organism under bottom. Note that  the Organism Details tab but you won’t be able to find it using default release date is three months from  the name of current date, change this if  the organism. data should be released sooner.  On You should now be at  the Sample page  Click "Sample" page. Required fields are listed on  the + Add button under sample group details   Fill in right and optional additional fields can be selected from  the unique name under basic details, add options on the right. Fill out  the Tax ID if it wasn’t added previously appropriate fields  and click next on "Next".  On Note: If you are submitting data for an organism that doesn’t have a Taxon ID (“Tax ID”) then you need to e-mail ENA to receive one ([email protected]). If you have already submitted the genome to NCBI then you can retrieve  the Run Taxon ID from your BioProject  page Select there. On  the appropriate data type ENA page, you will be able to search for the Taxon ID and find your organism under the "Organism Details" tab but you won’t be able to find it using the name of the organism.  On the "Sample" page, click the "+ Add" button under sample group details. Fill in the unique name under basic details, add the Tax ID if it wasn’t added previously, and click "Next". On the "Run" page, select the appropriate data type.  Fill in the required fields (they change with data type) type).  Note: “Insert size” cannot be a range, only a number. With our 600-900bp libraries, we enter 750 here.  Click Submit "Submit"  and confirm submission. You will immediately receive a confirmation e-mail email  but it takes some time before the information is actually live at the ENL links.