Aim : To annotate the bacmid_WT_20160520.fa sequence as the E2_KM667940.gb features. 2 possibilities :
To modify the E2_KM667940.gb sequence. 8406 modifications to take into account!
To annotate the bacmid sequence using the E2_KM667940.gb as reference.
After bibliography reseach, I decided to develop a strategy following the second item, based on the RATT (Rapid Annotation Transfer Tool, doi:10.1093/nar/gkq1268).
To transform the .gb to embl format. 3 possibilities
Use of Artemis (Genotoul cluster). Tested, OK
Use the EBI (http://www.ebi.ac.uk/, EMBL format with the same accession numbers as used by the NCBI). Tested, OK
Use the script genbank2embl.pl (perl script available from the RATT website). Not tested
To run RATT. Needs Mummer package, Artemis for visualization, optional ncbi blast for alignment
To validate the sequence annotated using ACT
” Artemis is a free genome browser and annotation tool that allows visualisation of sequence features, next generation data and the results of analyses within the context of the sequence, and also its six-frame translation. Artemis is written in Java, and is available for UNIX, Macintosh and Windows systems. It can read EMBL and GENBANK database entries or sequence in FASTA, indexed FASTA or raw format. Other sequence features can be in EMBL, GENBANK or GFF format.
ACT (Artemis Comparison Tool) is a free tool for displaying pairwise comparisons between two or more DNA sequences. It can be used to identify and analyse regions of similarity and difference between genomes and to explore conservation of synteny, in the context of the entire sequences and their annotation.“
To run Artemis :
E2_KM667940.gb opened and saved as E2_KM667940.embl in the interface. Explanation of the inteface :
I installed PAGIT on the PC_Analyse because it containes RATT. I followed the instructions.
Before running, sourcing the environment settings :
But as I installed PAGIT in /usr/bin/, do :
sudo -s (root)
I ran the test by typing the following in a terminal window:
$ bash ./dotestrun.sh
I tested RATT using the KM667940.embl reference to the KM667940.fasta. I followed the step 33 to 38 of the reference’s publication (doi:10.1038/nprot.2012.068). The command line is present in the cmd.txt file. All the annotation elements were correctly tranferred.
The *.output.txt file contains an overview of the elements transferred.
The *.Report.txt contains the annotation correctly transferred and information on incorrectly transferred gene, how they might be corrected.
The *.NOTTransfered.embl contains the annotations that were not tranferred
The *.final.embl file contains the sequence annotated
I compared the NCBI file with the testRATT.gi_700275637_gb_KM667940.1_.final.embl generated by RATT. I didn’t show any differences.
To visualise the result, I used also Artemis software. I compare it with the NCBI file using ACT.
I converted the *.final.embl file into *.final.gb file using ARTEMIS. The header was lost and so it was not possible to load the *.final.gb file into ApE. I add the LOCUS item at the beginning and I verified that the item ORIGIN was present. The modified file was opened but the annotation were lost. Solution: Use of the software Sequence format convert of EMBOSS: Seqret (http://www.ebi.ac.uk/Tools/sfc/emboss_seqret/) in command line in order to use it in a makefile. The command line was stored in the cmd.txt file. The .gb file produced works in ApE.
10/10/2016 The same pipeline was applied to the Bacmid_WT_20160615. The command lines were stored in to cmd.txt files containing the RATT+emblConversion and the blast visualisation.
In order to facilitate the annotation pipeline, all the command lines used will be clustered in a Makefile. The steps:
Make the annotation using RATT, input: embl file
Verify the annotation using ART and blast/ACT
Convert the *.final.embl file to *.gb file
Modify if needed the header of the *.gb file using text editor
The steps 1, 2 and 3 are integrated in the Makefile. The Makefile was successfully tested with the Bacmid_WT_20160615.