Authorea

Jenna M. Lang edited Data Submission.md about 9 years ago

Commit id: 2388a34dcdebe5be35ae880ff5efd3999dfd8b02

deletions | additions

If this runs successfully then you should see a both the .fsa and .agp files in your current directory. Important Note: NCBI considers a gap of less than 10 nucleotides to be "missing information" in a contig, not a gap between contigs (whereas A5 has no minimum gap size). Therefore Therefore, NCBI requires that contigs separated by less than 10 nucleotides be merged. This script performs that merging, meaning that the number of contigs in the FSA .fsa file may be less than in your input file. Therefore Therefore, we recommend counting the contigs in the FSA .fsa file: To count them in the terminal use the syntax grep -c “>” name_of_your_.fsa_file Important Note: If after running the fasta2agp.pl script and counting the contigs you have the same number of contigs as starting scaffolds, then you submit only the contigs as described above. in Section 10.1. **Create a SBT template** Create a SBT template file at NCBI [http://www.ncbi.nlm.nih.gov/WebSub/template.cgi](http://www.ncbi.nlm.nih.gov/WebSub/template.cgi) The BioProject # is the Bioproject ID starting with "PRJNA" which you received above, above. BioSample can be left blank blank. When you click create "Create the template, template", it will automatically download to your computer as template.sbt. We recommend immediately renaming the file to the appropriate project. **Tbl2asn** Download the tbl2asn program from [ftp://ftp.ncbi.nih.gov/toolbox/ncbi\_tools/converters/by\_program/tbl2asn/](ftp://ftp.ncbi.nih.gov/toolbox/ncbi\_tools/converters/by\_program/tbl2asn/) [ftp://ftp.ncbi.nih.gov/toolbox/ncbi\_tools/converters/by\_program/tbl2asn/](ftp://ftp.ncbi.nih.gov/toolbox/ncbi\_tools/converters/by\_program/tbl2asn/). If you are using Safari, a window will pop up asking for login information, just choose guest and unzip the version of the program that is compatible with your operating system. Other browsers will take you to a page with a lot of tbl2asn programs, download the one compatible with your operating system. After downloading the desired command-line program, double click to uncompress the archive and rename the resulting file to tbl2asn tbl2asn. Now change the file permissions of the file (in the terminal) since transfer by FTP resets the permissions. Syntax is: chmod 755 tbl2asn Once you have changed the permissions, create a new directory and place tbl2asn along with the SBT .sbt file and FSA .fsa files into the folder. Run the tbl2asn program using the following syntax. You will need to fill out the organism name, strain, location, collection date, and isolation source specific to your own project. path_to_program/tbl2asn -p path_to_files -t template_file_name -M n -Z discrep -j "[organism=X] [strain=X] [country=X: city, state abbreviation] [collection_date=X] [isolation-source=X] [gcode=11]" Following the -p is the path to the directory containing the FSA .fsa file, following the -t is the path to and name of the SBT .sbt template file Sample syntax

We recommend using Safari or Firefox for this step, in our hands Chrome can have issues with the Java requirements for uploading files. Go to: to [https://www.ebi.ac.uk/ena/about/sra_submissions](https://www.ebi.ac.uk/ena/about/sra_submissions) And and create an account account. Successful creation of an account should take you to the "Welcome to ENA's Sequence Read Archive (SRA) Webin submission system." screen system" screen. Click on New Submission tab "New Submission" tab. Select Submit "Submit sequence reads and experiments experiments". Click on Data "Data Upload Instructions Instructions" towards bottom of page page. This takes you to a variety of options for uploading files depending on your preference and operating system. We use the Webin Data Uploader. Click on the link which will download a .jlnp file. Open and run this file. Depending on your system you may have to download and install a new version of Java. On some systems you may have to right-click the .jlnp file and Open open with “Java Web Start”. Login using your e-mail email address and password password. In the WebinDataUploader, in the blank area to the right of the Local Upload directory, navigate to the directory on your computer containing the reads (using the path as you would in the terminal) terminal). Select the file(s) containing the reads and click Upload. "Upload". (Note that paired-end data is required to be in two separate fastq .fastq files. If your data came as one interleaved file, then the separated fastq .fastq files can be found in the directory where the A5 assembly was performed as [project name].raw1\_p1.fastq.gz and [project name].raw1\_p2.fastq.gz ) Note that the only acceptable file types for submission are gzip (.gz) and bzip (.bz2). To gzip files in the Terminal use the following syntax: gzip [filename] After completion, return to EMBL (the new submission "New Submission" tab of the SRA Webin submission system) and select the Next "Next" button. During this process, refreshing the page or navigating away from the page will reset the form and the information will be lost.Click Create a New Study. Fill in descriptions of the project and proceed to next tab. Select the appropriate metadata format, or in most cases the ENL default sample checklist at the bottom. Note that the default release date is three months from the current date, change this if the data should be released sooner. You should now be at the Sample page. Required fields are listed on the right and optional additional fields can be selected from the options on the right. Fill out the appropriate fields and click on Next. Note: If you are submitting data for an organism that doesn’t have Click "Create a Taxon ID (“Tax ID”) then you need to e-mail ENA to receive one ([email protected]). If you have already submitted New Study". Fill in descriptions of the genome project and proceed to NCBI then you can retrieve next tab. Select the Taxon ID from your BioProject page there. On appropriate metadata format, or in most cases the ENA page, you will be able to search for ENL default sample checklist at the Taxon ID and find your organism under bottom. Note that the Organism Details tab but you won’t be able to find it using default release date is three months from the name of current date, change this if the organism. data should be released sooner. On You should now be at the Sample page Click "Sample" page. Required fields are listed on the + Add button under sample group details Fill in right and optional additional fields can be selected from the unique name under basic details, add options on the right. Fill out the Tax ID if it wasn’t added previously appropriate fields and click next on "Next". On Note: If you are submitting data for an organism that doesn’t have a Taxon ID (“Tax ID”) then you need to e-mail ENA to receive one ([email protected]). If you have already submitted the genome to NCBI then you can retrieve the Run Taxon ID from your BioProject page Select there. On the appropriate data type ENA page, you will be able to search for the Taxon ID and find your organism under the "Organism Details" tab but you won’t be able to find it using the name of the organism. On the "Sample" page, click the "+ Add" button under sample group details. Fill in the unique name under basic details, add the Tax ID if it wasn’t added previously, and click "Next". On the "Run" page, select the appropriate data type. Fill in the required fields (they change with data type) type). Note: “Insert size” cannot be a range, only a number. With our 600-900bp libraries, we enter 750 here. Click Submit "Submit" and confirm submission. You will immediately receive a confirmation e-mail email but it takes some time before the information is actually live at the ENL links.