PostStage

Annotation strategy

23/09/2016

Aim : To annotate the bacmid_WT_20160520.fa sequence as the E2_KM667940.gb features. 2 possibilities :

  1. 1.

    To modify the E2_KM667940.gb sequence. 8406 modifications to take into account!

  2. 2.

    To annotate the bacmid sequence using the E2_KM667940.gb as reference.

    After bibliography reseach, I decided to develop a strategy following the second item, based on the RATT (Rapid Annotation Transfer Tool, doi:10.1093/nar/gkq1268).

    1. (a)

      To transform the .gb to embl format. 3 possibilities

      1. i.

        Use of Artemis (Genotoul cluster). Tested, OK

      2. ii.

        Use the EBI (http://www.ebi.ac.uk/, EMBL format with the same accession numbers as used by the NCBI). Tested, OK

      3. iii.

        Use the script genbank2embl.pl (perl script available from the RATT website). Not tested

    2. (b)

      To run RATT. Needs Mummer package, Artemis for visualization, optional ncbi blast for alignment

    3. (c)

      To validate the sequence annotated using ACT

    Artemis is a free genome browser and annotation tool that allows visualisation of sequence features, next generation data and the results of analyses within the context of the sequence, and also its six-frame translation. Artemis is written in Java, and is available for UNIX, Macintosh and Windows systems. It can read EMBL and GENBANK database entries or sequence in FASTA, indexed FASTA or raw format. Other sequence features can be in EMBL, GENBANK or GFF format.

    ACT (Artemis Comparison Tool) is a free tool for displaying pairwise comparisons between two or more DNA sequences. It can be used to identify and analyse regions of similarity and difference between genomes and to explore conservation of synteny, in the context of the entire sequences and their annotation.“

    Artemis

    To run Artemis : /usr/local/bioinfo/src/Artemis/artemis/art
    E2_KM667940.gb opened and saved as E2_KM667940.embl in the interface. Explanation of the inteface : https://www.researchgate.net/figure/232715025_fig2_Overview-of-Artemis-genome-browser-Description-of-the-tracks-used-for-data-analysis-in

    I installed PAGIT on the PC_Analyse because it containes RATT. I followed the instructions. Before running, sourcing the environment settings : source PAGIT/sourceme.pagit But as I installed PAGIT in /usr/bin/, do : sudo -s (root)
    before sourcing.

    I ran the test by typing the following in a terminal window: $ bash ./dotestrun.sh

    doi:10.1038/nprot.2012.068

    Annotations

    07/10/2016

    Test

    I tested RATT using the KM667940.embl reference to the KM667940.fasta. I followed the step 33 to 38 of the reference’s publication (doi:10.1038/nprot.2012.068). The command line is present in the cmd.txt file. All the annotation elements were correctly tranferred.

    1. (a)

      The *.output.txt file contains an overview of the elements transferred.

    2. (b)

      The *.Report.txt contains the annotation correctly transferred and information on incorrectly transferred gene, how they might be corrected.

    3. (c)

      The *.NOTTransfered.embl contains the annotations that were not tranferred

    4. (d)

      The *.final.embl file contains the sequence annotated

    10/10/2016

    I compared the NCBI file with the testRATT.gi_700275637_gb_KM667940.1_.final.embl generated by RATT. I didn’t show any differences.

    To visualise the result, I used also Artemis software. I compare it with the NCBI file using ACT.

    I converted the *.final.embl file into *.final.gb file using ARTEMIS. The header was lost and so it was not possible to load the *.final.gb file into ApE. I add the LOCUS item at the beginning and I verified that the item ORIGIN was present. The modified file was opened but the annotation were lost. Solution: Use of the software Sequence format convert of EMBOSS: Seqret (http://www.ebi.ac.uk/Tools/sfc/emboss_seqret/) in command line in order to use it in a makefile. The command line was stored in the cmd.txt file. The .gb file produced works in ApE.

    Annotation of the Bacmid_WT_20160615

    10/10/2016 The same pipeline was applied to the Bacmid_WT_20160615. The command lines were stored in to cmd.txt files containing the RATT+emblConversion and the blast visualisation.

Automation of the annotation pipeline

10/10/2016

In order to facilitate the annotation pipeline, all the command lines used will be clustered in a Makefile. The steps:

  1. 1.

    Make the annotation using RATT, input: embl file

  2. 2.

    Verify the annotation using ART and blast/ACT

  3. 3.

    Convert the *.final.embl file to *.gb file

  4. 4.

    Modify if needed the header of the *.gb file using text editor

11/10/2016

The steps 1, 2 and 3 are integrated in the Makefile. The Makefile was successfully tested with the Bacmid_WT_20160615.

Quality requirements

Important f