4. Discussion
Almost one third of PMMs reported in Table 2, involves R (18.4%) and G
(14.5%) with the remaining variants almost uniformly distributed among
the other amino acids, see Fig. 1b. The peculiar characteristics of
arginine and glycine have been invoked to explain this finding [27],
as the former is prone to be replaced, despite its six different
protecting codons, due to the high C-T and G-A transition probability
observed for 5’-CpG dinucleotides [28]. However, data shown in Fig.
1a indicate that arginine is still very frequently encountered in BMMs
(15.1%), while glycine occurrence reaches only a rather average value
(6.5%), suggesting that different mechanisms contribute to their
pathogenicity.
A comparative topological analysis of BMMs and PMMs, summarized in Table
3, clearly indicates that arginine mutations increase pathogenicity
whenever they occur at PISA-defined interfaces and, particularly, at
protein-DNA interfaces. In the protein-DNA interface, indeed, arginine
is more than six times more frequent in PMM than BMM. This feature is in
total agreement with the very critical role that this amino acid has in
the interaction with nucleic acids [29]. Moreover, arginine PMM
tends not to stay in buried protein moieties or in protein-protein
interfaces, whereas glycine PMMs exhibit the opposite trend. It is
interesting to note that the latter glycyl mutations are well above the
average more frequently found in protein-ligand interfaces, in agreement
with the suggested role of this amino acid to stabilize concave moieties
of the protein surface [30]. Prevalent localization of pathological
glycine mutations indicates that its replacement with amino acids
bearing larger side chains causes structural stress and, hence,
functional changes in mutated proteins.
The fact that among BMMs there are also three cases of arginine
substitutions at the protein-DNA interface, seems to contradict the
relevance of this amino acid in the latter interface. Hence, we have
manually checked the structural features of these three BMMs that are
associated with two transcription regulators, ZFP568 [31] and DUX4
[32], structurally resolved in PDB ID: 5V3J and PDB ID: 5ZFZ
respectively. We have used the two PDB structures to generate the R98/Q
mutant structure in ZFP568, and R411/Q and R599/H mutant structures of
DUX4. Fig. 2 shows how R/Q and R/H replacements can maintain protein-DNA
binding with the glutamyl amide group and with the histidyl imidazole
group. It is important to note that in both cases the original arginine
duty was not to keep these two proteins tightly bound to DNA, as it
would be needed in the case of histones, being transcription regulators
rather mobile proteins along DNA trails.
Thus, we have used the large array of items contained in the ClinVar
database for generating maps of amino acid replacements, confirming that
arginine and glycine are the most involved protein residues in missense
mutations. As expected, by comparing BMMs and PMMs, we have also proved
that amino acid similarity plays a significant role in determining
pathogenicity. With the present Structural Bioinformatics approach, by
using PISA as a protein interface analyzer, we have searched at an
atomic resolution those features that are responsible for pathogenic
mutations. Arginine and glycine, the most frequently involved in PMMs,
resulted as representatives of two different mechanisms of
pathogenicity. Arginine replacements, indeed, resulted to be pathogenic
when they involve interaction processes and glycine substitutions can be
deleterious whenever they can determine structural stresses in mutated
proteins. In the edgotype view of missense mutation effects [14],
arginine perturbs network edges and glycine modifies its nodes.
Structural characterization of PMMs can be expanded outside the current
limits of the PISA database, by implementing algorithms that can work on
reliably predicted structures for the advancement of genomic medicine.