Limits...
An enhanced partial order curve comparison algorithm and its application to analyzing protein folding trajectories.

Sun H, Ferhatosmanoglu H, Ota M, Wang Y - BMC Bioinformatics (2008)

Bottom Line: Current computation power enables researchers to produce a huge amount of folding simulation data.Hence there is a pressing need to be able to interpret and identify novel folding features from them.We demonstrate its generality and effectiveness by applying it to aligning multiple protein structures with low similarities.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, USA. sun.82@osu.edu

ABSTRACT

Background: Understanding how proteins fold is essential to our quest in discovering how life works at the molecular level. Current computation power enables researchers to produce a huge amount of folding simulation data. Hence there is a pressing need to be able to interpret and identify novel folding features from them.

Results: In this paper, we model each folding trajectory as a multi-dimensional curve. We then develop an effective multiple curve comparison (MCC) algorithm, called the enhanced partial order (EPO) algorithm, to extract features from a set of diverse folding trajectories, including both successful and unsuccessful simulation runs. The EPO algorithm addresses several new challenges presented by comparing high dimensional curves coming from folding trajectories. A detailed case study on miniprotein Trp-cage 1 demonstrates that our algorithm can detect similarities at rather low level, and extract biologically meaningful folding events.

Conclusion: The EPO algorithm is general and applicable to a wide range of applications. We demonstrate its generality and effectiveness by applying it to aligning multiple protein structures with low similarities. For user's convenience, we provide a web server for the algorithm at http://db.cse.ohio-state.edu/EPO.

Show MeSH
POG construction. Symbols inside the circles are the aligned node IDs. The table associated with each node encodes the set of points aligned to it. In particular, each row represents a point with its trajectory ID (T column) and its index along the trajectory (S column). For example, the entry (1, 2) associated with node b in (a) means that the aligned node b currently include the point , the second point from trajectory-1. In (a), a POG is initialized by the trajectory T1. An example of a POG after aligning a few trajectories is shown in (b). Note that a new node/branch is created when a point cannot be aligned to any existing nodes. For example, node e was created when  (i.e, the 3rd point of T2) was inserted. (c) shows the POG after merging point  from the node b to the node e constrained by the distance threshold ε.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2571979&req=5

Figure 6: POG construction. Symbols inside the circles are the aligned node IDs. The table associated with each node encodes the set of points aligned to it. In particular, each row represents a point with its trajectory ID (T column) and its index along the trajectory (S column). For example, the entry (1, 2) associated with node b in (a) means that the aligned node b currently include the point , the second point from trajectory-1. In (a), a POG is initialized by the trajectory T1. An example of a POG after aligning a few trajectories is shown in (b). Note that a new node/branch is created when a point cannot be aligned to any existing nodes. For example, node e was created when (i.e, the 3rd point of T2) was inserted. (c) shows the POG after merging point from the node b to the node e constrained by the distance threshold ε.

Mentions: (C1) indicates that the number of vertices of input curves aligned to each aligned node ok is greater than a size threshold τ . (C2) means that these aligned points are tightly clustered together (i.e, the diameter of them is bounded by a distance threshold ε). (C3) enforces that points in different aligned nodes still maintain their partial order along their respective trajectory, which means that oks are inherited and thus consistent to the points in each trajectory. Our goal is to maximize L, the size of such an alignment . See Figure 6(b) for an example of an alignment graph.


An enhanced partial order curve comparison algorithm and its application to analyzing protein folding trajectories.

Sun H, Ferhatosmanoglu H, Ota M, Wang Y - BMC Bioinformatics (2008)

POG construction. Symbols inside the circles are the aligned node IDs. The table associated with each node encodes the set of points aligned to it. In particular, each row represents a point with its trajectory ID (T column) and its index along the trajectory (S column). For example, the entry (1, 2) associated with node b in (a) means that the aligned node b currently include the point , the second point from trajectory-1. In (a), a POG is initialized by the trajectory T1. An example of a POG after aligning a few trajectories is shown in (b). Note that a new node/branch is created when a point cannot be aligned to any existing nodes. For example, node e was created when  (i.e, the 3rd point of T2) was inserted. (c) shows the POG after merging point  from the node b to the node e constrained by the distance threshold ε.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2571979&req=5

Figure 6: POG construction. Symbols inside the circles are the aligned node IDs. The table associated with each node encodes the set of points aligned to it. In particular, each row represents a point with its trajectory ID (T column) and its index along the trajectory (S column). For example, the entry (1, 2) associated with node b in (a) means that the aligned node b currently include the point , the second point from trajectory-1. In (a), a POG is initialized by the trajectory T1. An example of a POG after aligning a few trajectories is shown in (b). Note that a new node/branch is created when a point cannot be aligned to any existing nodes. For example, node e was created when (i.e, the 3rd point of T2) was inserted. (c) shows the POG after merging point from the node b to the node e constrained by the distance threshold ε.
Mentions: (C1) indicates that the number of vertices of input curves aligned to each aligned node ok is greater than a size threshold τ . (C2) means that these aligned points are tightly clustered together (i.e, the diameter of them is bounded by a distance threshold ε). (C3) enforces that points in different aligned nodes still maintain their partial order along their respective trajectory, which means that oks are inherited and thus consistent to the points in each trajectory. Our goal is to maximize L, the size of such an alignment . See Figure 6(b) for an example of an alignment graph.

Bottom Line: Current computation power enables researchers to produce a huge amount of folding simulation data.Hence there is a pressing need to be able to interpret and identify novel folding features from them.We demonstrate its generality and effectiveness by applying it to aligning multiple protein structures with low similarities.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, USA. sun.82@osu.edu

ABSTRACT

Background: Understanding how proteins fold is essential to our quest in discovering how life works at the molecular level. Current computation power enables researchers to produce a huge amount of folding simulation data. Hence there is a pressing need to be able to interpret and identify novel folding features from them.

Results: In this paper, we model each folding trajectory as a multi-dimensional curve. We then develop an effective multiple curve comparison (MCC) algorithm, called the enhanced partial order (EPO) algorithm, to extract features from a set of diverse folding trajectories, including both successful and unsuccessful simulation runs. The EPO algorithm addresses several new challenges presented by comparing high dimensional curves coming from folding trajectories. A detailed case study on miniprotein Trp-cage 1 demonstrates that our algorithm can detect similarities at rather low level, and extract biologically meaningful folding events.

Conclusion: The EPO algorithm is general and applicable to a wide range of applications. We demonstrate its generality and effectiveness by applying it to aligning multiple protein structures with low similarities. For user's convenience, we provide a web server for the algorithm at http://db.cse.ohio-state.edu/EPO.

Show MeSH