Limits...
An Algebro-topological description of protein domain structure.

Penner RC, Knudsen M, Wiuf C, Andersen JE - PLoS ONE (2011)

Bottom Line: Both invariants are global fatgraph features reflecting the interconnectivity of the domain by hydrogen bonds.We use local (secondary) and global (tertiary) fatgraph features to describe domain structures and illustrate that they are useful for classification of domains in CATH.In addition, we combine our method with two other methods thereby using primary, secondary, and tertiary structure information, and show that we can identify a large percentage of new and unclassified structures in CATH.

View Article: PubMed Central - PubMed

Affiliation: Center for the Topology and Quantization of Moduli Spaces, Department of Mathematical Sciences, Aarhus University, Aarhus, Denmark.

ABSTRACT
The space of possible protein structures appears vast and continuous, and the relationship between primary, secondary and tertiary structure levels is complex. Protein structure comparison and classification is therefore a difficult but important task since structure is a determinant for molecular interaction and function. We introduce a novel mathematical abstraction based on geometric topology to describe protein domain structure. Using the locations of the backbone atoms and the hydrogen bonds, we build a combinatorial object--a so-called fatgraph. The description is discrete yet gives rise to a 2-dimensional mathematical surface. Thus, each protein domain corresponds to a particular mathematical surface with characteristic topological invariants, such as the genus (number of holes) and the number of boundary components. Both invariants are global fatgraph features reflecting the interconnectivity of the domain by hydrogen bonds. We introduce the notion of robust variables, that is variables that are robust towards minor changes in the structure/fatgraph, and show that the genus and the number of boundary components are robust. Further, we investigate the distribution of different fatgraph variables and show how only four variables are capable of distinguishing different folds. We use local (secondary) and global (tertiary) fatgraph features to describe domain structures and illustrate that they are useful for classification of domains in CATH. In addition, we combine our method with two other methods thereby using primary, secondary, and tertiary structure information, and show that we can identify a large percentage of new and unclassified structures in CATH.

Show MeSH

Related in: MedlinePlus

Alignment scores versus differences in  and  for all pairs of S95-domains in the Pectate Lyase C-like topology (2.160.20).We use the normalized difference  between modified genera (and similarly for boundary components) to take discrepancies in domain length into account. A high alignment score indicates high sequence similarity and the plot illustrates that similarity is at the primary and tertiary levels are correlated.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3101207&req=5

pone-0019670-g008: Alignment scores versus differences in and for all pairs of S95-domains in the Pectate Lyase C-like topology (2.160.20).We use the normalized difference between modified genera (and similarly for boundary components) to take discrepancies in domain length into account. A high alignment score indicates high sequence similarity and the plot illustrates that similarity is at the primary and tertiary levels are correlated.

Mentions: Structural divergence may be caused by only modest modifications at the amino acid sequence level, and we compared how differences in sequences are reflected in the topological invariants. Fig. 8 shows scatter plots of normalized alignment scores (Materials and Methods, section 7) versus normalized differences in and , respectively, for all pairs of S95-domains in the topology Pectate Lyase C-like (2.60.120). In general, low sequence similarity implies relatively large differences in and with only a few outliers. For example, three domains have sequences very similar to that of 2iq7A00 (alignment score ), but still the normalized differences in (resp. ) are almost (resp. ). This may be explained by a lower number of hydrogen bonds in 2iq7A00 compared with the three other domains – a feature captured by the topological invariants but not by sequences alone.


An Algebro-topological description of protein domain structure.

Penner RC, Knudsen M, Wiuf C, Andersen JE - PLoS ONE (2011)

Alignment scores versus differences in  and  for all pairs of S95-domains in the Pectate Lyase C-like topology (2.160.20).We use the normalized difference  between modified genera (and similarly for boundary components) to take discrepancies in domain length into account. A high alignment score indicates high sequence similarity and the plot illustrates that similarity is at the primary and tertiary levels are correlated.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3101207&req=5

pone-0019670-g008: Alignment scores versus differences in and for all pairs of S95-domains in the Pectate Lyase C-like topology (2.160.20).We use the normalized difference between modified genera (and similarly for boundary components) to take discrepancies in domain length into account. A high alignment score indicates high sequence similarity and the plot illustrates that similarity is at the primary and tertiary levels are correlated.
Mentions: Structural divergence may be caused by only modest modifications at the amino acid sequence level, and we compared how differences in sequences are reflected in the topological invariants. Fig. 8 shows scatter plots of normalized alignment scores (Materials and Methods, section 7) versus normalized differences in and , respectively, for all pairs of S95-domains in the topology Pectate Lyase C-like (2.60.120). In general, low sequence similarity implies relatively large differences in and with only a few outliers. For example, three domains have sequences very similar to that of 2iq7A00 (alignment score ), but still the normalized differences in (resp. ) are almost (resp. ). This may be explained by a lower number of hydrogen bonds in 2iq7A00 compared with the three other domains – a feature captured by the topological invariants but not by sequences alone.

Bottom Line: Both invariants are global fatgraph features reflecting the interconnectivity of the domain by hydrogen bonds.We use local (secondary) and global (tertiary) fatgraph features to describe domain structures and illustrate that they are useful for classification of domains in CATH.In addition, we combine our method with two other methods thereby using primary, secondary, and tertiary structure information, and show that we can identify a large percentage of new and unclassified structures in CATH.

View Article: PubMed Central - PubMed

Affiliation: Center for the Topology and Quantization of Moduli Spaces, Department of Mathematical Sciences, Aarhus University, Aarhus, Denmark.

ABSTRACT
The space of possible protein structures appears vast and continuous, and the relationship between primary, secondary and tertiary structure levels is complex. Protein structure comparison and classification is therefore a difficult but important task since structure is a determinant for molecular interaction and function. We introduce a novel mathematical abstraction based on geometric topology to describe protein domain structure. Using the locations of the backbone atoms and the hydrogen bonds, we build a combinatorial object--a so-called fatgraph. The description is discrete yet gives rise to a 2-dimensional mathematical surface. Thus, each protein domain corresponds to a particular mathematical surface with characteristic topological invariants, such as the genus (number of holes) and the number of boundary components. Both invariants are global fatgraph features reflecting the interconnectivity of the domain by hydrogen bonds. We introduce the notion of robust variables, that is variables that are robust towards minor changes in the structure/fatgraph, and show that the genus and the number of boundary components are robust. Further, we investigate the distribution of different fatgraph variables and show how only four variables are capable of distinguishing different folds. We use local (secondary) and global (tertiary) fatgraph features to describe domain structures and illustrate that they are useful for classification of domains in CATH. In addition, we combine our method with two other methods thereby using primary, secondary, and tertiary structure information, and show that we can identify a large percentage of new and unclassified structures in CATH.

Show MeSH
Related in: MedlinePlus