Measuring global credibility with application to local sequence alignment.
Bottom Line:
Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete.The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators.Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments.
View Article:
PubMed Central - PubMed
Affiliation: Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, Washington, United States of America. bj@pnl.gov
ABSTRACT
Show MeSH
Computational biology is replete with high-dimensional (high-D) discrete prediction and inference problems, including sequence alignment, RNA structure prediction, phylogenetic inference, motif finding, prediction of pathways, and model selection problems in statistical genetics. Even though prediction and inference in these settings are uncertain, little attention has been focused on the development of global measures of uncertainty. Regardless of the procedure employed to produce a prediction, when a procedure delivers a single answer, that answer is a point estimate selected from the solution ensemble, the set of all possible solutions. For high-D discrete space, these ensembles are immense, and thus there is considerable uncertainty. We recommend the use of Bayesian credibility limits to describe this uncertainty, where a (1-alpha)%, 0< or =alpha< or =1, credibility limit is the minimum Hamming distance radius of a hyper-sphere containing (1-alpha)% of the posterior distribution. Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete. The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators. Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments. Related in: MedlinePlus |
Related In:
Results -
Collection
getmorefigures.php?uid=PMC2367447&req=5
Mentions: While we believe this evidence supports reconsideration of the maximum scoring alignment paradigm, stronger evidence for reconsideration has been in the literature for over a decade. In 1995, Miyazawa [26] was the first to report what we now call centroid alignments [27]. In addition to his very insightful development of reliable alignments, he showed that these alignments are superior, using x-ray crystal structures of proteins as ground truth. Figure 7 (reproduced from Miyazawa's work [26], with permission of the author and Oxford Journals) shows that structural predictions based on reliable (centroid) alignments quite consistently produce lower root mean squared deviations than those based on maximum similarity alignments. Thus, from a practical biological prospective, there is already clear evidence in the literature that centroid alignments can be applied with advantage in the prediction of protein structures. |
View Article: PubMed Central - PubMed
Affiliation: Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, Washington, United States of America. bj@pnl.gov