Abstract
Introduction In the era of genomics, bioinformatics has become
highly significant, assisting in the genome-wide discovery and
characterization of potential genomic regions of different enzymes for a
variety of industrial applications. Catalases are unique among
environmental biocatalysts due to their high catalytic rate and
thermostability. Therefore, present communication deals with the
bioinformatics analysis on the characterization of the protein sequence
of catalases from diverse plant sources and subjected to assessment of
homology, multiple sequence alignment, construction of the phylogenetic
tree, amino acid composition, physiochemical properties, motif search,
secondary and tertiary structure prediction and its Ramachandran plot .
Method In the present study, a total of 65 protein sequences of
catalases from diverse plant sources were retrieved from the NCBI
database and subjected to bioinformatics assessment for homology search,
multiple sequence alignment, phylogenetic tree construction, motif
search, and prediction of structural analysis using different in silico
analytical tools available. Result The protein sequence of many
enzymes have been assessed and analyzed using bioinformatics tools. The
diversity of plant sources for catalases was found to be largest for
Oryza sativa.The amino acid residue variability in the 65
catalase protein sequences studied ranged from 90 to 533. The molecular
weights varied between 10322.46 to 61366.87 daltons, while the pI values
varied between 4.53 to 7.95. Thermostability and hydrophilicity were
identified in abundance in these proteins, as shown by their relatively
high aliphatic index and negative GRAVY values, respectively. The
phylogenetic tree displayed unique clusters for each plant genus, and
numerous accessions of the same genus were clustered together,
suggesting similarity at the sequence level.The prediction of the
secondary structure of the catalase showed the predominance of random
coil followed by the alpha helix. Ramachandran plot showed that most of
the amino acid residues is in the core region which represent the
favourable/allowed combination of phi-psi values. The dark region with
maximum residues, correspond to no steric hinderance, ie these are the
allowed regions for α- helical and β-sheet conformations.
Conclusion Five motifs were consistently identified across all
sequences, indicating that they were related to the plant catalase
PLN02609 family. Plant catalase PLN02609 is found in three kingdoms of
life and is known to perform a number of biosynthetic and degradative
functions. By analysing plant catalase protein sequences using
bioinformatics, it is possible to molecularly clone critical genes and
anticipate gene regulatory networks and whole-cell dynamics.
Ramachandran plot showed that most of the amino acid residues are in the
core region which represent the favourable comformation of amino acid
residues of the catalase. In silico study of protein sequences
elucidates the various catalytic sites, allowing for potential
modification to achieve desired properties.