Preliminary Results

We select all the couples of residues predicted by DCA to have a strong co-evolutionary signal and that are not in contact according to the intra-domain solved structures (intra-chain false positive). We test whether these couples have a inter-chains parings explanation.

The figure 5,6 show the inter-chain true positive rate of the couples of residues that are not in contact in the domain structure (intra-chain false positive) averaged over all the pfam. Although the rate for the first ten ranked couples is higher than random extraction, meaning that there is a inter-chain co-evolutionary signal, the prediction rate is relatively low .

The distribution of the absolute ranking of the first intra-domain false positive couples across the different families (showed in figure 7 ) shows a great heterogeneity, due to different factors. First the rate of the predictions depends on the quality of the MSA and scale with the length of the protein. Moreover the diversity in the architectures present within the domain families and the variety of possible inter-chain binding affect the magnitude between intra and inter domain co-evolutionary signals.

We can select the intra-chain false positive more reliable looking at the value of the DCA score. This is more significative than the absolute ranking of the couples when comparing different pfam. In figure 8 are summarize the results of this analysis. Selecting the couples of residues based on the DCA scores the prediction rate are higher up to \(0.8\). On the other hand the families that have residue pairs with high DCA score and not in contact in the domain structures are not frequents (figure 9).