Optimal simultaneous superpositioning of multiple structures with missing data.
Bottom Line:
In the extreme, disregarding missing positions prohibits the calculation of a superposition altogether.Here, we present a general solution for determining an optimal superposition when some of the data are missing.We use the expectation-maximization algorithm, a classic statistical technique for dealing with incomplete data, to find both maximum-likelihood solutions and the optimal least-squares solution as a special case.
View Article:
PubMed Central - PubMed
Affiliation: Department of Biochemistry, Brandeis University, MS009, Waltham, MA 02454, USA. dtheobald@brandeis.edu
ABSTRACT
Show MeSH
Motivation: Superpositioning is an essential technique in structural biology that facilitates the comparison and analysis of conformational differences among topologically similar structures. Performing a superposition requires a one-to-one correspondence, or alignment, of the point sets in the different structures. However, in practice, some points are usually 'missing' from several structures, for example, when the alignment contains gaps. Current superposition methods deal with missing data simply by superpositioning a subset of points that are shared among all the structures. This practice is inefficient, as it ignores important data, and it fails to satisfy the common least-squares criterion. In the extreme, disregarding missing positions prohibits the calculation of a superposition altogether. Results: Here, we present a general solution for determining an optimal superposition when some of the data are missing. We use the expectation-maximization algorithm, a classic statistical technique for dealing with incomplete data, to find both maximum-likelihood solutions and the optimal least-squares solution as a special case. Availability and implementation: The methods presented here are implemented in THESEUS 2.0, a program for superpositioning macromolecular structures. ANSI C source code and selected compiled binaries for various computing platforms are freely available under the GNU open source license from http://www.theseus3d.org. Contact: dtheobald@brandeis.edu Supplementary information: Supplementary data are available at Bioinformatics online. |
Related In:
Results -
Collection
License getmorefigures.php?uid=PMC3400950&req=5
Mentions: The EM method produces a least-squares superposition much closer to the ‘true’ complete data superposition than the conventional method (Fig. 4). The EM superposition is also largely independent of which portions were fully aligned. The EM variances are much lower for the entire structure than the conventional method variances, and they are generally much closer to the ‘true’ variances (Fig. 7). Results for the non-isotropic ML superpositions are similar to those of the LS superpositions (Figs. 5 and 8). The EM method can easily handle the ‘impossible’ situation seen in Figure 2c, with results similar to the true superposition (Figs. 6, 7c and 8c).Fig. 4. |
View Article: PubMed Central - PubMed
Affiliation: Department of Biochemistry, Brandeis University, MS009, Waltham, MA 02454, USA. dtheobald@brandeis.edu
Motivation: Superpositioning is an essential technique in structural biology that facilitates the comparison and analysis of conformational differences among topologically similar structures. Performing a superposition requires a one-to-one correspondence, or alignment, of the point sets in the different structures. However, in practice, some points are usually 'missing' from several structures, for example, when the alignment contains gaps. Current superposition methods deal with missing data simply by superpositioning a subset of points that are shared among all the structures. This practice is inefficient, as it ignores important data, and it fails to satisfy the common least-squares criterion. In the extreme, disregarding missing positions prohibits the calculation of a superposition altogether.
Results: Here, we present a general solution for determining an optimal superposition when some of the data are missing. We use the expectation-maximization algorithm, a classic statistical technique for dealing with incomplete data, to find both maximum-likelihood solutions and the optimal least-squares solution as a special case.
Availability and implementation: The methods presented here are implemented in THESEUS 2.0, a program for superpositioning macromolecular structures. ANSI C source code and selected compiled binaries for various computing platforms are freely available under the GNU open source license from http://www.theseus3d.org.
Contact: dtheobald@brandeis.edu
Supplementary information: Supplementary data are available at Bioinformatics online.