Limits...
Characterization of non-trivial neighborhood fold constraints from protein sequences using generalized topohydrophobicity.

Fourty G, Callebaut I, Mornon JP - Bioinform Biol Insights (2008)

Bottom Line: From a large set of structural alignments processed from the FSSP database, we selected 1485 structural sub-families including at least 8 members, with accurate alignments and limited redundancy.We show that residues within helices, even when deeply buried, have few non-trivial neighbors (0-2), whereas beta-strand residues clearly exhibit a multimodal behavior, dominated by the local geometry of the tetrahedron (3 non-trivial close neighbors associated with one tetrahedron; 6 with two tetrahedra).Useful topological constraints on the immediate neighborhood of an amino acid, but also on its correlated solvent accessibility, can thus be derived using this approach, from the simple knowledge of multiple sequence alignments.

View Article: PubMed Central - PubMed

Affiliation: Département de Biologie Structurale, Institut de Minéralogie et de Physique des Milieux Condensés, CNRS UMR 7590 - Universités Paris 6/Paris 7, France.

ABSTRACT
Prediction of key features of protein structures, such as secondary structure, solvent accessibility and number of contacts between residues, provides useful structural constraints for comparative modeling, fold recognition, ab-initio fold prediction and detection of remote relationships. In this study, we aim at characterizing the number of non-trivial close neighbors, or long-range contacts of a residue, as a function of its "topohydrophobic" index deduced from multiple sequence alignments and of the secondary structure in which it is embedded. The "topohydrophobic" index is calculated using a two-class distribution of amino acids, based on their mean atom depths. From a large set of structural alignments processed from the FSSP database, we selected 1485 structural sub-families including at least 8 members, with accurate alignments and limited redundancy. We show that residues within helices, even when deeply buried, have few non-trivial neighbors (0-2), whereas beta-strand residues clearly exhibit a multimodal behavior, dominated by the local geometry of the tetrahedron (3 non-trivial close neighbors associated with one tetrahedron; 6 with two tetrahedra). This observed behavior allows the distinction, from sequence profiles, between edge and central beta-strands within beta-sheets. Useful topological constraints on the immediate neighborhood of an amino acid, but also on its correlated solvent accessibility, can thus be derived using this approach, from the simple knowledge of multiple sequence alignments.

No MeSH data available.


Work positions. A. Number of work positions as a function of the percentage of major secondary structure observed at a position of FSSP-derived multiple alignments (x). For x = 75 %, there are 135197 work positions (60021 H, 38860 E, 38316 C). B. Populations of the 27 work position types (see the Results section) in the final bank with the two groups (G1, G2) model. H stands for Helix, E for Extended (β—strand) and C for Coil.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC2735972&req=5

f5-bbi-2008-047: Work positions. A. Number of work positions as a function of the percentage of major secondary structure observed at a position of FSSP-derived multiple alignments (x). For x = 75 %, there are 135197 work positions (60021 H, 38860 E, 38316 C). B. Populations of the 27 work position types (see the Results section) in the final bank with the two groups (G1, G2) model. H stands for Helix, E for Extended (β—strand) and C for Coil.

Mentions: We choose to take into account only work positions in which a same secondary structure is sufficiently conserved (at more than x%). Figure 5A shows the number of work positions as a function of this threshold x. We consider that x ≥ 75% offers an acceptable compromise, ensuring that work positions are structurally relevant according to the secondary structure conservation and keeping enough data to perform a large-scale study. Figure 5B shows the distribution of work positions in the different secondary structures as a function of the generalized topohydrophobic index y1.


Characterization of non-trivial neighborhood fold constraints from protein sequences using generalized topohydrophobicity.

Fourty G, Callebaut I, Mornon JP - Bioinform Biol Insights (2008)

Work positions. A. Number of work positions as a function of the percentage of major secondary structure observed at a position of FSSP-derived multiple alignments (x). For x = 75 %, there are 135197 work positions (60021 H, 38860 E, 38316 C). B. Populations of the 27 work position types (see the Results section) in the final bank with the two groups (G1, G2) model. H stands for Helix, E for Extended (β—strand) and C for Coil.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC2735972&req=5

f5-bbi-2008-047: Work positions. A. Number of work positions as a function of the percentage of major secondary structure observed at a position of FSSP-derived multiple alignments (x). For x = 75 %, there are 135197 work positions (60021 H, 38860 E, 38316 C). B. Populations of the 27 work position types (see the Results section) in the final bank with the two groups (G1, G2) model. H stands for Helix, E for Extended (β—strand) and C for Coil.
Mentions: We choose to take into account only work positions in which a same secondary structure is sufficiently conserved (at more than x%). Figure 5A shows the number of work positions as a function of this threshold x. We consider that x ≥ 75% offers an acceptable compromise, ensuring that work positions are structurally relevant according to the secondary structure conservation and keeping enough data to perform a large-scale study. Figure 5B shows the distribution of work positions in the different secondary structures as a function of the generalized topohydrophobic index y1.

Bottom Line: From a large set of structural alignments processed from the FSSP database, we selected 1485 structural sub-families including at least 8 members, with accurate alignments and limited redundancy.We show that residues within helices, even when deeply buried, have few non-trivial neighbors (0-2), whereas beta-strand residues clearly exhibit a multimodal behavior, dominated by the local geometry of the tetrahedron (3 non-trivial close neighbors associated with one tetrahedron; 6 with two tetrahedra).Useful topological constraints on the immediate neighborhood of an amino acid, but also on its correlated solvent accessibility, can thus be derived using this approach, from the simple knowledge of multiple sequence alignments.

View Article: PubMed Central - PubMed

Affiliation: Département de Biologie Structurale, Institut de Minéralogie et de Physique des Milieux Condensés, CNRS UMR 7590 - Universités Paris 6/Paris 7, France.

ABSTRACT
Prediction of key features of protein structures, such as secondary structure, solvent accessibility and number of contacts between residues, provides useful structural constraints for comparative modeling, fold recognition, ab-initio fold prediction and detection of remote relationships. In this study, we aim at characterizing the number of non-trivial close neighbors, or long-range contacts of a residue, as a function of its "topohydrophobic" index deduced from multiple sequence alignments and of the secondary structure in which it is embedded. The "topohydrophobic" index is calculated using a two-class distribution of amino acids, based on their mean atom depths. From a large set of structural alignments processed from the FSSP database, we selected 1485 structural sub-families including at least 8 members, with accurate alignments and limited redundancy. We show that residues within helices, even when deeply buried, have few non-trivial neighbors (0-2), whereas beta-strand residues clearly exhibit a multimodal behavior, dominated by the local geometry of the tetrahedron (3 non-trivial close neighbors associated with one tetrahedron; 6 with two tetrahedra). This observed behavior allows the distinction, from sequence profiles, between edge and central beta-strands within beta-sheets. Useful topological constraints on the immediate neighborhood of an amino acid, but also on its correlated solvent accessibility, can thus be derived using this approach, from the simple knowledge of multiple sequence alignments.

No MeSH data available.