Limits...
An enhanced partial order curve comparison algorithm and its application to analyzing protein folding trajectories.

Sun H, Ferhatosmanoglu H, Ota M, Wang Y - BMC Bioinformatics (2008)

Bottom Line: Current computation power enables researchers to produce a huge amount of folding simulation data.Hence there is a pressing need to be able to interpret and identify novel folding features from them.We demonstrate its generality and effectiveness by applying it to aligning multiple protein structures with low similarities.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, USA. sun.82@osu.edu

ABSTRACT

Background: Understanding how proteins fold is essential to our quest in discovering how life works at the molecular level. Current computation power enables researchers to produce a huge amount of folding simulation data. Hence there is a pressing need to be able to interpret and identify novel folding features from them.

Results: In this paper, we model each folding trajectory as a multi-dimensional curve. We then develop an effective multiple curve comparison (MCC) algorithm, called the enhanced partial order (EPO) algorithm, to extract features from a set of diverse folding trajectories, including both successful and unsuccessful simulation runs. The EPO algorithm addresses several new challenges presented by comparing high dimensional curves coming from folding trajectories. A detailed case study on miniprotein Trp-cage 1 demonstrates that our algorithm can detect similarities at rather low level, and extract biologically meaningful folding events.

Conclusion: The EPO algorithm is general and applicable to a wide range of applications. We demonstrate its generality and effectiveness by applying it to aligning multiple protein structures with low similarities. For user's convenience, we provide a web server for the algorithm at http://db.cse.ohio-state.edu/EPO.

Show MeSH
An example for scoring function. Empty and solid points are aligned to the nodes oa and ob, respectively. For a new point p (the star), although it is closer to ω(ob), it is better grouped with points aligned to oa. Hence ideally, it should be aligned to oa instead of to ob.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2571979&req=5

Figure 7: An example for scoring function. Empty and solid points are aligned to the nodes oa and ob, respectively. For a new point p (the star), although it is closer to ω(ob), it is better grouped with points aligned to oa. Hence ideally, it should be aligned to oa instead of to ob.

Mentions: Natural choices for the node center ω(o) of o include using an earlier computed canonical cluster center, or the center of the minimum enclosing ball of points already aligned to this node (or some weighted variants of it), which is a dynamic point relying on alignment order. The advantage of the former is that canonical cluster centers tend to spread apart, which helps to increase coverage of aligned nodes. Furthermore, the canonical cluster centers serve as good candidates for node centers as we already know that there are many points around them. The disadvantage is that it does not consider the distribution of points already aligned to this node. See Figure 7, where without considering the distribution of points aligned to oa and ob, the new point p will be aligned to ob even though oa is a better choice. Using the center of the minimum enclosing ball alleviates this problem. However, the influence regions of nodes produced this way tend to overlap much more than using the canonical cluster centers and the position of these centers also depend heavily on the order of curves aligned. We combine the advantages of both approaches into the following two-level scoring function for measuring the similarity δ(o, p).


An enhanced partial order curve comparison algorithm and its application to analyzing protein folding trajectories.

Sun H, Ferhatosmanoglu H, Ota M, Wang Y - BMC Bioinformatics (2008)

An example for scoring function. Empty and solid points are aligned to the nodes oa and ob, respectively. For a new point p (the star), although it is closer to ω(ob), it is better grouped with points aligned to oa. Hence ideally, it should be aligned to oa instead of to ob.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2571979&req=5

Figure 7: An example for scoring function. Empty and solid points are aligned to the nodes oa and ob, respectively. For a new point p (the star), although it is closer to ω(ob), it is better grouped with points aligned to oa. Hence ideally, it should be aligned to oa instead of to ob.
Mentions: Natural choices for the node center ω(o) of o include using an earlier computed canonical cluster center, or the center of the minimum enclosing ball of points already aligned to this node (or some weighted variants of it), which is a dynamic point relying on alignment order. The advantage of the former is that canonical cluster centers tend to spread apart, which helps to increase coverage of aligned nodes. Furthermore, the canonical cluster centers serve as good candidates for node centers as we already know that there are many points around them. The disadvantage is that it does not consider the distribution of points already aligned to this node. See Figure 7, where without considering the distribution of points aligned to oa and ob, the new point p will be aligned to ob even though oa is a better choice. Using the center of the minimum enclosing ball alleviates this problem. However, the influence regions of nodes produced this way tend to overlap much more than using the canonical cluster centers and the position of these centers also depend heavily on the order of curves aligned. We combine the advantages of both approaches into the following two-level scoring function for measuring the similarity δ(o, p).

Bottom Line: Current computation power enables researchers to produce a huge amount of folding simulation data.Hence there is a pressing need to be able to interpret and identify novel folding features from them.We demonstrate its generality and effectiveness by applying it to aligning multiple protein structures with low similarities.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, USA. sun.82@osu.edu

ABSTRACT

Background: Understanding how proteins fold is essential to our quest in discovering how life works at the molecular level. Current computation power enables researchers to produce a huge amount of folding simulation data. Hence there is a pressing need to be able to interpret and identify novel folding features from them.

Results: In this paper, we model each folding trajectory as a multi-dimensional curve. We then develop an effective multiple curve comparison (MCC) algorithm, called the enhanced partial order (EPO) algorithm, to extract features from a set of diverse folding trajectories, including both successful and unsuccessful simulation runs. The EPO algorithm addresses several new challenges presented by comparing high dimensional curves coming from folding trajectories. A detailed case study on miniprotein Trp-cage 1 demonstrates that our algorithm can detect similarities at rather low level, and extract biologically meaningful folding events.

Conclusion: The EPO algorithm is general and applicable to a wide range of applications. We demonstrate its generality and effectiveness by applying it to aligning multiple protein structures with low similarities. For user's convenience, we provide a web server for the algorithm at http://db.cse.ohio-state.edu/EPO.

Show MeSH