Limits...
An Algebro-topological description of protein domain structure.

Penner RC, Knudsen M, Wiuf C, Andersen JE - PLoS ONE (2011)

Bottom Line: Both invariants are global fatgraph features reflecting the interconnectivity of the domain by hydrogen bonds.We use local (secondary) and global (tertiary) fatgraph features to describe domain structures and illustrate that they are useful for classification of domains in CATH.In addition, we combine our method with two other methods thereby using primary, secondary, and tertiary structure information, and show that we can identify a large percentage of new and unclassified structures in CATH.

View Article: PubMed Central - PubMed

Affiliation: Center for the Topology and Quantization of Moduli Spaces, Department of Mathematical Sciences, Aarhus University, Aarhus, Denmark.

ABSTRACT
The space of possible protein structures appears vast and continuous, and the relationship between primary, secondary and tertiary structure levels is complex. Protein structure comparison and classification is therefore a difficult but important task since structure is a determinant for molecular interaction and function. We introduce a novel mathematical abstraction based on geometric topology to describe protein domain structure. Using the locations of the backbone atoms and the hydrogen bonds, we build a combinatorial object--a so-called fatgraph. The description is discrete yet gives rise to a 2-dimensional mathematical surface. Thus, each protein domain corresponds to a particular mathematical surface with characteristic topological invariants, such as the genus (number of holes) and the number of boundary components. Both invariants are global fatgraph features reflecting the interconnectivity of the domain by hydrogen bonds. We introduce the notion of robust variables, that is variables that are robust towards minor changes in the structure/fatgraph, and show that the genus and the number of boundary components are robust. Further, we investigate the distribution of different fatgraph variables and show how only four variables are capable of distinguishing different folds. We use local (secondary) and global (tertiary) fatgraph features to describe domain structures and illustrate that they are useful for classification of domains in CATH. In addition, we combine our method with two other methods thereby using primary, secondary, and tertiary structure information, and show that we can identify a large percentage of new and unclassified structures in CATH.

Show MeSH

Related in: MedlinePlus

The domain 1o88A00 is classified as Pectate Lyase C-like (2.160.20) with complete CATHSOLID classification 2.160.20.10.11.2.1.1.1.The Class plot shows  for all domains with  (colored according to A-level), and the Architecture plot shows  for all domains with  (colored according to the three T-levels). This continues all the ways down to the last plot where  are shown for for .
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3101207&req=5

pone-0019670-g005: The domain 1o88A00 is classified as Pectate Lyase C-like (2.160.20) with complete CATHSOLID classification 2.160.20.10.11.2.1.1.1.The Class plot shows for all domains with (colored according to A-level), and the Architecture plot shows for all domains with (colored according to the three T-levels). This continues all the ways down to the last plot where are shown for for .

Mentions: Fig. 5 shows an example of how and separate domains at different CATHSOLID levels. It transpires that the best separation is obtained at T-, H-, and S-levels. The grouping at the A-level is often very broad, and an architecture may comprise domains of very different sizes. Furthermore, since the order of the secondary structure elements is not taken into account at the A-level, a single architecture may contain domains with very different connectivities [4], [25]. This is likely the explanation for the lack of separation of A-levels observed in Fig. 5. On the other hand, because the fatgraph approach is based on structural features, we do not expect to see a clear separation at the SOLID levels, since these are defined in terms of sequence overlap and similarity. Fig. 6 shows that and separate the H-level families in the CATH topology Pectate Lyase C-like (CATH classification 2.160.20) with one family (red in Fig. 6) being larger than the others. To test the empirical robustness of the variables, we generated modified structures for each domain using the CONCOORD algorithm [26] and calculated and from the resulting structures (Fig. 6 and Materials and Methods, section 4). The figure indicates that even after modifications, the variables are able to separate domains at the H-level. Furthermore, for individual domains, the variables did not in general deviate significantly from the original values (illustrated in Fig. S2).


An Algebro-topological description of protein domain structure.

Penner RC, Knudsen M, Wiuf C, Andersen JE - PLoS ONE (2011)

The domain 1o88A00 is classified as Pectate Lyase C-like (2.160.20) with complete CATHSOLID classification 2.160.20.10.11.2.1.1.1.The Class plot shows  for all domains with  (colored according to A-level), and the Architecture plot shows  for all domains with  (colored according to the three T-levels). This continues all the ways down to the last plot where  are shown for for .
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3101207&req=5

pone-0019670-g005: The domain 1o88A00 is classified as Pectate Lyase C-like (2.160.20) with complete CATHSOLID classification 2.160.20.10.11.2.1.1.1.The Class plot shows for all domains with (colored according to A-level), and the Architecture plot shows for all domains with (colored according to the three T-levels). This continues all the ways down to the last plot where are shown for for .
Mentions: Fig. 5 shows an example of how and separate domains at different CATHSOLID levels. It transpires that the best separation is obtained at T-, H-, and S-levels. The grouping at the A-level is often very broad, and an architecture may comprise domains of very different sizes. Furthermore, since the order of the secondary structure elements is not taken into account at the A-level, a single architecture may contain domains with very different connectivities [4], [25]. This is likely the explanation for the lack of separation of A-levels observed in Fig. 5. On the other hand, because the fatgraph approach is based on structural features, we do not expect to see a clear separation at the SOLID levels, since these are defined in terms of sequence overlap and similarity. Fig. 6 shows that and separate the H-level families in the CATH topology Pectate Lyase C-like (CATH classification 2.160.20) with one family (red in Fig. 6) being larger than the others. To test the empirical robustness of the variables, we generated modified structures for each domain using the CONCOORD algorithm [26] and calculated and from the resulting structures (Fig. 6 and Materials and Methods, section 4). The figure indicates that even after modifications, the variables are able to separate domains at the H-level. Furthermore, for individual domains, the variables did not in general deviate significantly from the original values (illustrated in Fig. S2).

Bottom Line: Both invariants are global fatgraph features reflecting the interconnectivity of the domain by hydrogen bonds.We use local (secondary) and global (tertiary) fatgraph features to describe domain structures and illustrate that they are useful for classification of domains in CATH.In addition, we combine our method with two other methods thereby using primary, secondary, and tertiary structure information, and show that we can identify a large percentage of new and unclassified structures in CATH.

View Article: PubMed Central - PubMed

Affiliation: Center for the Topology and Quantization of Moduli Spaces, Department of Mathematical Sciences, Aarhus University, Aarhus, Denmark.

ABSTRACT
The space of possible protein structures appears vast and continuous, and the relationship between primary, secondary and tertiary structure levels is complex. Protein structure comparison and classification is therefore a difficult but important task since structure is a determinant for molecular interaction and function. We introduce a novel mathematical abstraction based on geometric topology to describe protein domain structure. Using the locations of the backbone atoms and the hydrogen bonds, we build a combinatorial object--a so-called fatgraph. The description is discrete yet gives rise to a 2-dimensional mathematical surface. Thus, each protein domain corresponds to a particular mathematical surface with characteristic topological invariants, such as the genus (number of holes) and the number of boundary components. Both invariants are global fatgraph features reflecting the interconnectivity of the domain by hydrogen bonds. We introduce the notion of robust variables, that is variables that are robust towards minor changes in the structure/fatgraph, and show that the genus and the number of boundary components are robust. Further, we investigate the distribution of different fatgraph variables and show how only four variables are capable of distinguishing different folds. We use local (secondary) and global (tertiary) fatgraph features to describe domain structures and illustrate that they are useful for classification of domains in CATH. In addition, we combine our method with two other methods thereby using primary, secondary, and tertiary structure information, and show that we can identify a large percentage of new and unclassified structures in CATH.

Show MeSH
Related in: MedlinePlus