Piecewise linear approximation of protein structures using the principle of minimum message length.
Bottom Line:
Relying solely on standard secondary structure may result in a significant loss of structural information.Our method generates consistent and statistically robust piecewise linear explanations of protein coordinate data, resulting in a powerful and concise representation of the structure.Indeed, as is common with applications of the MML criterion, this method is free of parameters and thresholds, in striking contrast to the existing programs which are often beset by them.
View Article:
PubMed Central - PubMed
Affiliation: Clayton School of Information Technology, Monash University, Clayton, VIC 3800, Australia. arun.konagurthu@monash.edu
ABSTRACT
Show MeSH
Unlabelled: Simple and concise representations of protein-folding patterns provide powerful abstractions for visualizations, comparisons, classifications, searching and aligning structural data. Structures are often abstracted by replacing standard secondary structural features-that is, helices and strands of sheet-by vectors or linear segments. Relying solely on standard secondary structure may result in a significant loss of structural information. Further, traditional methods of simplification crucially depend on the consistency and accuracy of external methods to assign secondary structures to protein coordinate data. Although many methods exist automatically to identify secondary structure, the impreciseness of definitions, along with errors and inconsistencies in experimental structure data, drastically limit their applicability to generate reliable simplified representations, especially for structural comparison. This article introduces a mathematically rigorous algorithm to delineate protein structure using the elegant statistical and inductive inference framework of minimum message length (MML). Our method generates consistent and statistically robust piecewise linear explanations of protein coordinate data, resulting in a powerful and concise representation of the structure. The delineation is completely independent of the approaches of using hydrogen-bonding patterns or inspecting local substructural geometry that the current methods use. Indeed, as is common with applications of the MML criterion, this method is free of parameters and thresholds, in striking contrast to the existing programs which are often beset by them. The analysis of results over a large number of proteins suggests that the method produces consistent delineation of structures that encompasses, among others, the segments corresponding to standard secondary structure. Availability: http://www.csse.monash.edu.au/~karun/pmml. |
Related In:
Results -
Collection
License getmorefigures.php?uid=PMC3117365&req=5
Mentions: To explain the encoding of this part more clearly, consider Fig. 1. Let Lij denote the line segment between two successive endpoints in 𝒬, Qr′≡Pi and Qr′+1≡Pj. This line will be used to explain the intermediate points Pi+1···Pj−1∈𝒫. For any intermediate point Pr, i+1≤r≤j−1, define three spatial deviations Δ sr, tr and ur. In the reverse order, ur is the signed distance of Pr to the plane defined by vectors Pj−Pi and z-axis. To define tr, first project Pr to the plane defined above. Call this projection point . Given this projection, tr is the signed perpendicular distance of to the line Lij. Finally, the deviation Δsr is the (unsigned) lateral distance along the line Lij between points of projection of and onto the line (Fig. 1). (Refer the supplementary note containing a discussion on these deviations under arbitrary rotation of the coordinates.) Note that once the endpoints Pi and Pj are specified, and given the sets of spatial deviations Δ sr's, tr's and ur's for the intermediate points Pr, ∀i<r<j, the receiver can entirely recover the coordinates of all intermediate points.Fig. 1. |
View Article: PubMed Central - PubMed
Affiliation: Clayton School of Information Technology, Monash University, Clayton, VIC 3800, Australia. arun.konagurthu@monash.edu
Unlabelled: Simple and concise representations of protein-folding patterns provide powerful abstractions for visualizations, comparisons, classifications, searching and aligning structural data. Structures are often abstracted by replacing standard secondary structural features-that is, helices and strands of sheet-by vectors or linear segments. Relying solely on standard secondary structure may result in a significant loss of structural information. Further, traditional methods of simplification crucially depend on the consistency and accuracy of external methods to assign secondary structures to protein coordinate data. Although many methods exist automatically to identify secondary structure, the impreciseness of definitions, along with errors and inconsistencies in experimental structure data, drastically limit their applicability to generate reliable simplified representations, especially for structural comparison. This article introduces a mathematically rigorous algorithm to delineate protein structure using the elegant statistical and inductive inference framework of minimum message length (MML). Our method generates consistent and statistically robust piecewise linear explanations of protein coordinate data, resulting in a powerful and concise representation of the structure. The delineation is completely independent of the approaches of using hydrogen-bonding patterns or inspecting local substructural geometry that the current methods use. Indeed, as is common with applications of the MML criterion, this method is free of parameters and thresholds, in striking contrast to the existing programs which are often beset by them. The analysis of results over a large number of proteins suggests that the method produces consistent delineation of structures that encompasses, among others, the segments corresponding to standard secondary structure.
Availability: http://www.csse.monash.edu.au/~karun/pmml.