deletions | additions
diff --git a/Data Submission.md b/Data Submission.md
index 9cf07e1..924c017 100644
--- a/Data Submission.md
+++ b/Data Submission.md
...
If this runs successfully then you should see a both the .fsa and .agp files in your current directory.
Important Note: NCBI considers a gap of less than 10 nucleotides to be "missing information" in a contig, not a gap between contigs (whereas A5 has no minimum gap size).
Therefore Therefore, NCBI requires that contigs separated by less than 10 nucleotides be merged. This script performs that merging, meaning that the number of contigs in the
FSA .fsa file may be less than in your input file.
Therefore Therefore, we recommend counting the contigs in the
FSA .fsa file:
To count them in the terminal use the syntax
grep -c “>” name_of_your_.fsa_file
Important Note: If after running the fasta2agp.pl script and counting the contigs you have the same number of contigs as starting scaffolds, then you submit only the contigs as described
above. in Section 10.1.
**Create a SBT template**
Create a SBT template file at NCBI
[http://www.ncbi.nlm.nih.gov/WebSub/template.cgi](http://www.ncbi.nlm.nih.gov/WebSub/template.cgi)
The BioProject # is the Bioproject ID starting with "PRJNA" which you received
above, above. BioSample can be left
blank blank.
When you click
create "Create the
template, template", it will automatically download to your computer as template.sbt. We recommend immediately renaming the file to the appropriate project.
**Tbl2asn**
Download the tbl2asn program from
[ftp://ftp.ncbi.nih.gov/toolbox/ncbi\_tools/converters/by\_program/tbl2asn/](ftp://ftp.ncbi.nih.gov/toolbox/ncbi\_tools/converters/by\_program/tbl2asn/) [ftp://ftp.ncbi.nih.gov/toolbox/ncbi\_tools/converters/by\_program/tbl2asn/](ftp://ftp.ncbi.nih.gov/toolbox/ncbi\_tools/converters/by\_program/tbl2asn/).
If you are using Safari, a window will pop up asking for login information, just choose guest and unzip the version of the program that is compatible with your operating system. Other browsers will take you to a page with a lot of tbl2asn programs, download the one compatible with your operating system.
After downloading the desired command-line program,
double click to uncompress the archive and rename the resulting file to
tbl2asn tbl2asn. Now change the file permissions of the file (in the terminal) since transfer by FTP resets the permissions.
Syntax is:
chmod 755 tbl2asn
Once you have changed the permissions, create a new directory and place tbl2asn along with the
SBT .sbt file and
FSA .fsa files into the folder.
Run the tbl2asn program using the following syntax. You will need to fill out the organism name, strain, location, collection date,
and isolation source specific to your own project.
path_to_program/tbl2asn -p path_to_files -t template_file_name
-M n -Z discrep -j "[organism=X] [strain=X] [country=X: city,
state abbreviation] [collection_date=X] [isolation-source=X] [gcode=11]"
Following the -p is the path to the directory containing the
FSA .fsa file, following the -t is the path to and name of the
SBT .sbt template file
Sample syntax
...
We recommend using Safari or Firefox for this step, in our hands Chrome can have issues with the Java requirements for uploading files.
Go
to: to [https://www.ebi.ac.uk/ena/about/sra_submissions](https://www.ebi.ac.uk/ena/about/sra_submissions)
And and create an
account account.
Successful creation of an account should take you to the "Welcome to ENA's Sequence Read Archive (SRA) Webin submission
system." screen system" screen.
Click on
New Submission tab "New Submission" tab.
Select
Submit "Submit sequence reads and
experiments experiments".
Click on
Data "Data Upload
Instructions Instructions" towards bottom of
page page.
This takes you to a variety of options for uploading files depending on your preference and operating system. We use the Webin Data Uploader. Click on the link which will download a .jlnp file. Open and run this file. Depending on your system you may have to download and install a new version of Java. On some systems you may have to right-click the .jlnp file and
Open open with “Java Web Start”.
Login using your
e-mail email address and
password password.
In the WebinDataUploader, in the blank area to the right of the Local Upload directory, navigate to the directory on your computer containing the reads (using the path as you would in the
terminal) terminal).
Select the file(s) containing the reads and click
Upload. "Upload".
(Note that paired-end data is required to be in two separate
fastq .fastq files. If your data came as one interleaved file, then the separated
fastq .fastq files can be found in the directory where the A5 assembly was performed as [project name].raw1\_p1.fastq.gz and [project name].raw1\_p2.fastq.gz )
Note that the only acceptable file types for submission are gzip (.gz) and bzip (.bz2). To gzip files in the Terminal use the following syntax:
gzip [filename]
After completion, return to EMBL (the
new submission "New Submission" tab of the SRA Webin submission system) and select the
Next "Next" button. During this process, refreshing the page or navigating away from the page will reset the form and the information will be lost.
Click Create a New Study. Fill in descriptions of the project and proceed to next tab. Select the appropriate metadata format, or in most cases the ENL default sample checklist at the bottom. Note that the default release date is three months from the current date, change this if the data should be released sooner.
You should now be at the Sample page. Required fields are listed on the right and optional additional fields can be selected from the options on the right. Fill out the appropriate fields and click on Next.
Note: If you are submitting data for an organism that doesn’t have Click "Create a
Taxon ID (“Tax ID”) then you need to e-mail ENA to receive one ([email protected]). If you have already submitted New Study". Fill in descriptions of the
genome project and proceed to
NCBI then you can retrieve next tab. Select the
Taxon ID from your BioProject page there.
On appropriate metadata format, or in most cases the
ENA page, you will be able to search for ENL default sample checklist at the
Taxon ID and find your organism under bottom. Note that the
Organism Details tab but you won’t be able to find it using default release date is three months from the
name of current date, change this if the
organism. data should be released sooner.
On You should now be at the
Sample page
Click "Sample" page. Required fields are listed on the
+ Add button under sample group details
Fill in right and optional additional fields can be selected from the
unique name under basic details, add options on the right. Fill out the
Tax ID if it wasn’t added previously appropriate fields and click
next on "Next".
On Note: If you are submitting data for an organism that doesn’t have a Taxon ID (“Tax ID”) then you need to e-mail ENA to receive one ([email protected]). If you have already submitted the genome to NCBI then you can retrieve the
Run Taxon ID from your BioProject page
Select there. On the
appropriate data type ENA page, you will be able to search for the Taxon ID and find your organism under the "Organism Details" tab but you won’t be able to find it using the name of the organism.
On the "Sample" page, click the "+ Add" button under sample group details. Fill in the unique name under basic details, add the Tax ID if it wasn’t added previously, and click "Next". On the "Run" page, select the appropriate data type. Fill in the required fields (they change with data
type) type).
Note: “Insert size” cannot be a range, only a number.
With our 600-900bp libraries, we enter 750 here.
Click
Submit "Submit" and confirm submission. You will immediately receive a confirmation
e-mail email but it takes some time before the information is actually live at the ENL links.