Limits...
Optimal simultaneous superpositioning of multiple structures with missing data.

Theobald DL, Steindel PA - Bioinformatics (2012)

Bottom Line: Here, we present a general solution for determining an optimal superposition when some of the data are missing.We use the expectation-maximization algorithm, a classic statistical technique for dealing with incomplete data, to find both maximum-likelihood solutions and the optimal least-squares solution as a special case.The methods presented here are implemented in THESEUS 2.0, a program for superpositioning macromolecular structures.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry, Brandeis University, MS009, Waltham, MA 02454, USA. dtheobald@brandeis.edu

ABSTRACT

Motivation: Superpositioning is an essential technique in structural biology that facilitates the comparison and analysis of conformational differences among topologically similar structures. Performing a superposition requires a one-to-one correspondence, or alignment, of the point sets in the different structures. However, in practice, some points are usually 'missing' from several structures, for example, when the alignment contains gaps. Current superposition methods deal with missing data simply by superpositioning a subset of points that are shared among all the structures. This practice is inefficient, as it ignores important data, and it fails to satisfy the common least-squares criterion. In the extreme, disregarding missing positions prohibits the calculation of a superposition altogether.

Results: Here, we present a general solution for determining an optimal superposition when some of the data are missing. We use the expectation-maximization algorithm, a classic statistical technique for dealing with incomplete data, to find both maximum-likelihood solutions and the optimal least-squares solution as a special case.

Availability and implementation: The methods presented here are implemented in THESEUS 2.0, a program for superpositioning macromolecular structures. ANSI C source code and selected compiled binaries for various computing platforms are freely available under the GNU open source license from http://www.theseus3d.org.

Contact: dtheobald@brandeis.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Show MeSH
Least-squares (isotropic) superpositions with missing data. In each pane, four protein structures are superpositioned, each with a different conformation. The top row, (a)–(c), compares superpositions of proteins corresponding to the alignment in Figure 2a, where only residues in the α-helix are fully shared among the structures. Other regions of the structures, e.g. the two-stranded β-sheet in the right side of the images, are missing in some of the structures. The bottom row, (d)–(f), compares superpositions corresponding to the alignment in Figure 2b, where only residues in the β-sheet are fully shared. The left-most column, (a) and (d), shows superpositions found using the EM method described here. The middle column, (b) and (e), shows the reference superposition using all of the data; this can be thought of as the ‘true’ superposition before regions of the structures were deleted. For ease of comparison, in these images, the missing residues are not displayed, even though all of the original data were included in the superposition calculation. The right-most column, (c) and (f), shows conventional superpositions based on only the subset of fully shared residues. The structures used in these superpositions were derived from four NMR models of a zinc finger domain, PDB ID 1zfd
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3400950&req=5

Figure 4: Least-squares (isotropic) superpositions with missing data. In each pane, four protein structures are superpositioned, each with a different conformation. The top row, (a)–(c), compares superpositions of proteins corresponding to the alignment in Figure 2a, where only residues in the α-helix are fully shared among the structures. Other regions of the structures, e.g. the two-stranded β-sheet in the right side of the images, are missing in some of the structures. The bottom row, (d)–(f), compares superpositions corresponding to the alignment in Figure 2b, where only residues in the β-sheet are fully shared. The left-most column, (a) and (d), shows superpositions found using the EM method described here. The middle column, (b) and (e), shows the reference superposition using all of the data; this can be thought of as the ‘true’ superposition before regions of the structures were deleted. For ease of comparison, in these images, the missing residues are not displayed, even though all of the original data were included in the superposition calculation. The right-most column, (c) and (f), shows conventional superpositions based on only the subset of fully shared residues. The structures used in these superpositions were derived from four NMR models of a zinc finger domain, PDB ID 1zfd

Mentions: The EM method produces a least-squares superposition much closer to the ‘true’ complete data superposition than the conventional method (Fig. 4). The EM superposition is also largely independent of which portions were fully aligned. The EM variances are much lower for the entire structure than the conventional method variances, and they are generally much closer to the ‘true’ variances (Fig. 7). Results for the non-isotropic ML superpositions are similar to those of the LS superpositions (Figs. 5 and 8). The EM method can easily handle the ‘impossible’ situation seen in Figure 2c, with results similar to the true superposition (Figs. 6, 7c and 8c).Fig. 4.


Optimal simultaneous superpositioning of multiple structures with missing data.

Theobald DL, Steindel PA - Bioinformatics (2012)

Least-squares (isotropic) superpositions with missing data. In each pane, four protein structures are superpositioned, each with a different conformation. The top row, (a)–(c), compares superpositions of proteins corresponding to the alignment in Figure 2a, where only residues in the α-helix are fully shared among the structures. Other regions of the structures, e.g. the two-stranded β-sheet in the right side of the images, are missing in some of the structures. The bottom row, (d)–(f), compares superpositions corresponding to the alignment in Figure 2b, where only residues in the β-sheet are fully shared. The left-most column, (a) and (d), shows superpositions found using the EM method described here. The middle column, (b) and (e), shows the reference superposition using all of the data; this can be thought of as the ‘true’ superposition before regions of the structures were deleted. For ease of comparison, in these images, the missing residues are not displayed, even though all of the original data were included in the superposition calculation. The right-most column, (c) and (f), shows conventional superpositions based on only the subset of fully shared residues. The structures used in these superpositions were derived from four NMR models of a zinc finger domain, PDB ID 1zfd
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3400950&req=5

Figure 4: Least-squares (isotropic) superpositions with missing data. In each pane, four protein structures are superpositioned, each with a different conformation. The top row, (a)–(c), compares superpositions of proteins corresponding to the alignment in Figure 2a, where only residues in the α-helix are fully shared among the structures. Other regions of the structures, e.g. the two-stranded β-sheet in the right side of the images, are missing in some of the structures. The bottom row, (d)–(f), compares superpositions corresponding to the alignment in Figure 2b, where only residues in the β-sheet are fully shared. The left-most column, (a) and (d), shows superpositions found using the EM method described here. The middle column, (b) and (e), shows the reference superposition using all of the data; this can be thought of as the ‘true’ superposition before regions of the structures were deleted. For ease of comparison, in these images, the missing residues are not displayed, even though all of the original data were included in the superposition calculation. The right-most column, (c) and (f), shows conventional superpositions based on only the subset of fully shared residues. The structures used in these superpositions were derived from four NMR models of a zinc finger domain, PDB ID 1zfd
Mentions: The EM method produces a least-squares superposition much closer to the ‘true’ complete data superposition than the conventional method (Fig. 4). The EM superposition is also largely independent of which portions were fully aligned. The EM variances are much lower for the entire structure than the conventional method variances, and they are generally much closer to the ‘true’ variances (Fig. 7). Results for the non-isotropic ML superpositions are similar to those of the LS superpositions (Figs. 5 and 8). The EM method can easily handle the ‘impossible’ situation seen in Figure 2c, with results similar to the true superposition (Figs. 6, 7c and 8c).Fig. 4.

Bottom Line: Here, we present a general solution for determining an optimal superposition when some of the data are missing.We use the expectation-maximization algorithm, a classic statistical technique for dealing with incomplete data, to find both maximum-likelihood solutions and the optimal least-squares solution as a special case.The methods presented here are implemented in THESEUS 2.0, a program for superpositioning macromolecular structures.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry, Brandeis University, MS009, Waltham, MA 02454, USA. dtheobald@brandeis.edu

ABSTRACT

Motivation: Superpositioning is an essential technique in structural biology that facilitates the comparison and analysis of conformational differences among topologically similar structures. Performing a superposition requires a one-to-one correspondence, or alignment, of the point sets in the different structures. However, in practice, some points are usually 'missing' from several structures, for example, when the alignment contains gaps. Current superposition methods deal with missing data simply by superpositioning a subset of points that are shared among all the structures. This practice is inefficient, as it ignores important data, and it fails to satisfy the common least-squares criterion. In the extreme, disregarding missing positions prohibits the calculation of a superposition altogether.

Results: Here, we present a general solution for determining an optimal superposition when some of the data are missing. We use the expectation-maximization algorithm, a classic statistical technique for dealing with incomplete data, to find both maximum-likelihood solutions and the optimal least-squares solution as a special case.

Availability and implementation: The methods presented here are implemented in THESEUS 2.0, a program for superpositioning macromolecular structures. ANSI C source code and selected compiled binaries for various computing platforms are freely available under the GNU open source license from http://www.theseus3d.org.

Contact: dtheobald@brandeis.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Show MeSH