Limits...
Capturing coevolutionary signals inrepeat proteins.

Espada R, Parra RG, Mora T, Walczak AM, Ferreiro DU - BMC Bioinformatics (2015)

Bottom Line: Importantly, parameter values obtained for all other interactions are not significantly affected by the equalization.We quantify the robustness of the procedure and assign confidence levels to the interactions, identifying the minimum number of sequences needed to extract evolutionary information in several repeat protein families.The overall procedure can be used to reconstruct the interactions at distances larger than repeat-pairs, identifying the characteristics of the strongest couplings in each family, and can be applied to any system that appears translationally symmetric.

View Article: PubMed Central - PubMed

Affiliation: Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina.

ABSTRACT

Background: The analysis of correlations of amino acid occurrences in globular domains has led to the development of statistical tools that can identify native contacts - portions of the chains that come to close distance in folded structural ensembles. Here we introduce a direct coupling analysis for repeat proteins - natural systems for which the identification of folding domains remains challenging.

Results: We show that the inherent translational symmetry of repeat protein sequences introduces a strong bias in the pair correlations at precisely the length scale of the repeat-unit. Equalizing for this bias in an objective way reveals true co-evolutionary signals from which local native contacts can be identified. Importantly, parameter values obtained for all other interactions are not significantly affected by the equalization. We quantify the robustness of the procedure and assign confidence levels to the interactions, identifying the minimum number of sequences needed to extract evolutionary information in several repeat protein families.

Conclusions: The overall procedure can be used to reconstruct the interactions at distances larger than repeat-pairs, identifying the characteristics of the strongest couplings in each family, and can be applied to any system that appears translationally symmetric.

No MeSH data available.


Related in: MedlinePlus

Correlations along ANK repeat arrays. a Direct information first 50 hits over a contact map (PDB:1N11,A, resid 436 to 534) calculated for three consecutive ANK repeats without (upper triangle) or with (lower triangle) the DI id equalization. b Proportion of DI (black diamonds) and DI id (red circles) hits between repeated units for alignments of n-th neighbours. The red line is a non-linear fit of the DI id data to an exponential decay
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4489039&req=5

Fig4: Correlations along ANK repeat arrays. a Direct information first 50 hits over a contact map (PDB:1N11,A, resid 436 to 534) calculated for three consecutive ANK repeats without (upper triangle) or with (lower triangle) the DI id equalization. b Proportion of DI (black diamonds) and DI id (red circles) hits between repeated units for alignments of n-th neighbours. The red line is a non-linear fit of the DI id data to an exponential decay

Mentions: An analogous correction to the weights of the sequences must be made to treat n-neighbours interactions (lower triangle of Additional file 1: Figure S5 for the uncorrected DCA of three consecutive repeats of the ankyrin family). When the proper equalization is performed, the symmetric signals attenuate and the true coevolutionary correlations appear (DI id lower triangle of Additional file 1: Figure S5). In principle the correction to the symmetric (i,i+nL0) interactions can be applied to arbitrarily large repeat proteins. Yet the sampling needed is much larger and the computing time growths as L 2, restricting the application to longer repeat arrays. Since in ANKs, as in most of the repeat protein families, interactions are concentrated at relatively short sequences separations, we reconstructed a DI id matrix from a parallel calculation of repeat pairs. For first neighbours we estimated DI id as described previously, and for second neighbours we concatenated the sequences in an MSA of size 2L0. The reconstructed matrix for all interactions is very similar to the one calculated on the whole three-repeat MSA (Additional file 1: Figure S5), facilitating the application of the analysis for larger repeat arrays. On Fig. 4a we show the first 50 hits of DI (upper triangle) and DI id (lower triangle) overlaid on a contact map of three consecutive ANK repeats (PDB:1N11,A; resid 436 to 534). The necessity of the equalization becomes more evident when longer repeat arrays are considered.Fig. 4


Capturing coevolutionary signals inrepeat proteins.

Espada R, Parra RG, Mora T, Walczak AM, Ferreiro DU - BMC Bioinformatics (2015)

Correlations along ANK repeat arrays. a Direct information first 50 hits over a contact map (PDB:1N11,A, resid 436 to 534) calculated for three consecutive ANK repeats without (upper triangle) or with (lower triangle) the DI id equalization. b Proportion of DI (black diamonds) and DI id (red circles) hits between repeated units for alignments of n-th neighbours. The red line is a non-linear fit of the DI id data to an exponential decay
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4489039&req=5

Fig4: Correlations along ANK repeat arrays. a Direct information first 50 hits over a contact map (PDB:1N11,A, resid 436 to 534) calculated for three consecutive ANK repeats without (upper triangle) or with (lower triangle) the DI id equalization. b Proportion of DI (black diamonds) and DI id (red circles) hits between repeated units for alignments of n-th neighbours. The red line is a non-linear fit of the DI id data to an exponential decay
Mentions: An analogous correction to the weights of the sequences must be made to treat n-neighbours interactions (lower triangle of Additional file 1: Figure S5 for the uncorrected DCA of three consecutive repeats of the ankyrin family). When the proper equalization is performed, the symmetric signals attenuate and the true coevolutionary correlations appear (DI id lower triangle of Additional file 1: Figure S5). In principle the correction to the symmetric (i,i+nL0) interactions can be applied to arbitrarily large repeat proteins. Yet the sampling needed is much larger and the computing time growths as L 2, restricting the application to longer repeat arrays. Since in ANKs, as in most of the repeat protein families, interactions are concentrated at relatively short sequences separations, we reconstructed a DI id matrix from a parallel calculation of repeat pairs. For first neighbours we estimated DI id as described previously, and for second neighbours we concatenated the sequences in an MSA of size 2L0. The reconstructed matrix for all interactions is very similar to the one calculated on the whole three-repeat MSA (Additional file 1: Figure S5), facilitating the application of the analysis for larger repeat arrays. On Fig. 4a we show the first 50 hits of DI (upper triangle) and DI id (lower triangle) overlaid on a contact map of three consecutive ANK repeats (PDB:1N11,A; resid 436 to 534). The necessity of the equalization becomes more evident when longer repeat arrays are considered.Fig. 4

Bottom Line: Importantly, parameter values obtained for all other interactions are not significantly affected by the equalization.We quantify the robustness of the procedure and assign confidence levels to the interactions, identifying the minimum number of sequences needed to extract evolutionary information in several repeat protein families.The overall procedure can be used to reconstruct the interactions at distances larger than repeat-pairs, identifying the characteristics of the strongest couplings in each family, and can be applied to any system that appears translationally symmetric.

View Article: PubMed Central - PubMed

Affiliation: Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina.

ABSTRACT

Background: The analysis of correlations of amino acid occurrences in globular domains has led to the development of statistical tools that can identify native contacts - portions of the chains that come to close distance in folded structural ensembles. Here we introduce a direct coupling analysis for repeat proteins - natural systems for which the identification of folding domains remains challenging.

Results: We show that the inherent translational symmetry of repeat protein sequences introduces a strong bias in the pair correlations at precisely the length scale of the repeat-unit. Equalizing for this bias in an objective way reveals true co-evolutionary signals from which local native contacts can be identified. Importantly, parameter values obtained for all other interactions are not significantly affected by the equalization. We quantify the robustness of the procedure and assign confidence levels to the interactions, identifying the minimum number of sequences needed to extract evolutionary information in several repeat protein families.

Conclusions: The overall procedure can be used to reconstruct the interactions at distances larger than repeat-pairs, identifying the characteristics of the strongest couplings in each family, and can be applied to any system that appears translationally symmetric.

No MeSH data available.


Related in: MedlinePlus