Limits...
Optimal simultaneous superpositioning of multiple structures with missing data.

Theobald DL, Steindel PA - Bioinformatics (2012)

Bottom Line: Here, we present a general solution for determining an optimal superposition when some of the data are missing.We use the expectation-maximization algorithm, a classic statistical technique for dealing with incomplete data, to find both maximum-likelihood solutions and the optimal least-squares solution as a special case.The methods presented here are implemented in THESEUS 2.0, a program for superpositioning macromolecular structures.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry, Brandeis University, MS009, Waltham, MA 02454, USA. dtheobald@brandeis.edu

ABSTRACT

Motivation: Superpositioning is an essential technique in structural biology that facilitates the comparison and analysis of conformational differences among topologically similar structures. Performing a superposition requires a one-to-one correspondence, or alignment, of the point sets in the different structures. However, in practice, some points are usually 'missing' from several structures, for example, when the alignment contains gaps. Current superposition methods deal with missing data simply by superpositioning a subset of points that are shared among all the structures. This practice is inefficient, as it ignores important data, and it fails to satisfy the common least-squares criterion. In the extreme, disregarding missing positions prohibits the calculation of a superposition altogether.

Results: Here, we present a general solution for determining an optimal superposition when some of the data are missing. We use the expectation-maximization algorithm, a classic statistical technique for dealing with incomplete data, to find both maximum-likelihood solutions and the optimal least-squares solution as a special case.

Availability and implementation: The methods presented here are implemented in THESEUS 2.0, a program for superpositioning macromolecular structures. ANSI C source code and selected compiled binaries for various computing platforms are freely available under the GNU open source license from http://www.theseus3d.org.

Contact: dtheobald@brandeis.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Show MeSH
Standard deviation of α-carbons for non-isotropic maximum-likelihood superpositions. For details see the legend to Figure 7
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3400950&req=5

Figure 8: Standard deviation of α-carbons for non-isotropic maximum-likelihood superpositions. For details see the legend to Figure 7

Mentions: The EM method produces a least-squares superposition much closer to the ‘true’ complete data superposition than the conventional method (Fig. 4). The EM superposition is also largely independent of which portions were fully aligned. The EM variances are much lower for the entire structure than the conventional method variances, and they are generally much closer to the ‘true’ variances (Fig. 7). Results for the non-isotropic ML superpositions are similar to those of the LS superpositions (Figs. 5 and 8). The EM method can easily handle the ‘impossible’ situation seen in Figure 2c, with results similar to the true superposition (Figs. 6, 7c and 8c).Fig. 4.


Optimal simultaneous superpositioning of multiple structures with missing data.

Theobald DL, Steindel PA - Bioinformatics (2012)

Standard deviation of α-carbons for non-isotropic maximum-likelihood superpositions. For details see the legend to Figure 7
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3400950&req=5

Figure 8: Standard deviation of α-carbons for non-isotropic maximum-likelihood superpositions. For details see the legend to Figure 7
Mentions: The EM method produces a least-squares superposition much closer to the ‘true’ complete data superposition than the conventional method (Fig. 4). The EM superposition is also largely independent of which portions were fully aligned. The EM variances are much lower for the entire structure than the conventional method variances, and they are generally much closer to the ‘true’ variances (Fig. 7). Results for the non-isotropic ML superpositions are similar to those of the LS superpositions (Figs. 5 and 8). The EM method can easily handle the ‘impossible’ situation seen in Figure 2c, with results similar to the true superposition (Figs. 6, 7c and 8c).Fig. 4.

Bottom Line: Here, we present a general solution for determining an optimal superposition when some of the data are missing.We use the expectation-maximization algorithm, a classic statistical technique for dealing with incomplete data, to find both maximum-likelihood solutions and the optimal least-squares solution as a special case.The methods presented here are implemented in THESEUS 2.0, a program for superpositioning macromolecular structures.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry, Brandeis University, MS009, Waltham, MA 02454, USA. dtheobald@brandeis.edu

ABSTRACT

Motivation: Superpositioning is an essential technique in structural biology that facilitates the comparison and analysis of conformational differences among topologically similar structures. Performing a superposition requires a one-to-one correspondence, or alignment, of the point sets in the different structures. However, in practice, some points are usually 'missing' from several structures, for example, when the alignment contains gaps. Current superposition methods deal with missing data simply by superpositioning a subset of points that are shared among all the structures. This practice is inefficient, as it ignores important data, and it fails to satisfy the common least-squares criterion. In the extreme, disregarding missing positions prohibits the calculation of a superposition altogether.

Results: Here, we present a general solution for determining an optimal superposition when some of the data are missing. We use the expectation-maximization algorithm, a classic statistical technique for dealing with incomplete data, to find both maximum-likelihood solutions and the optimal least-squares solution as a special case.

Availability and implementation: The methods presented here are implemented in THESEUS 2.0, a program for superpositioning macromolecular structures. ANSI C source code and selected compiled binaries for various computing platforms are freely available under the GNU open source license from http://www.theseus3d.org.

Contact: dtheobald@brandeis.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Show MeSH