Limits...
Measuring global credibility with application to local sequence alignment.

Webb-Robertson BJ, McCue LA, Lawrence CE - PLoS Comput. Biol. (2008)

Bottom Line: Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete.The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators.Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, Washington, United States of America. bj@pnl.gov

ABSTRACT
Computational biology is replete with high-dimensional (high-D) discrete prediction and inference problems, including sequence alignment, RNA structure prediction, phylogenetic inference, motif finding, prediction of pathways, and model selection problems in statistical genetics. Even though prediction and inference in these settings are uncertain, little attention has been focused on the development of global measures of uncertainty. Regardless of the procedure employed to produce a prediction, when a procedure delivers a single answer, that answer is a point estimate selected from the solution ensemble, the set of all possible solutions. For high-D discrete space, these ensembles are immense, and thus there is considerable uncertainty. We recommend the use of Bayesian credibility limits to describe this uncertainty, where a (1-alpha)%, 0< or =alpha< or =1, credibility limit is the minimum Hamming distance radius of a hyper-sphere containing (1-alpha)% of the posterior distribution. Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete. The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators. Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments.

Show MeSH

Related in: MedlinePlus

Heat-map alignment representation of the EC.Sequence indices are given on the left and the color gradient associated with aligned residue probabilities is given on the right. Sequence regions that have no aligned pairs with a probability greater than 0.5 are ignored by the alignment, grayed out, and aligned with a dot to differentiate these from insertion/deletion events that utilize a dash.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2367447&req=5

pcbi-1000077-g006: Heat-map alignment representation of the EC.Sequence indices are given on the left and the color gradient associated with aligned residue probabilities is given on the right. Sequence regions that have no aligned pairs with a probability greater than 0.5 are ignored by the alignment, grayed out, and aligned with a dot to differentiate these from insertion/deletion events that utilize a dash.

Mentions: The use of heat maps or other means to visually illustrate confidence in the individual alignment of individual pairs of bases must accommodate a different feature for centroid alignments. Specifically, EC alignments have a feature not present in standard alignments, in that they allow stretches of sequence in the middle of an alignment to remain unaligned in a manner analogous to those regions at the ends of local alignments. That is, a residue in one sequence that cannot be reliably aligned with any single residue in the other sequence is excluded from the centroid alignment. Aligning any such residues to any bases in the other sequence would only increase the average distance of the centroid alignment from the posterior distribution of alignments. In addition, with probabilistic alignment, we return marginal probabilities of all residue pairs. Therefore, to display all the features of this alignment, we employ 1) a traditional dash to represent gaps, 2) a dot to represent residues that cannot be reliably aligned and are thus ignored in the alignment, and 3) a gradient color scheme (i.e., a heat map) to show the base pair alignment probabilities, where red indicates high probability for that residue pair, green indicates probabilities nearing 50%, and the ignored region is grayed out to further differentiate those residues for which the variability in alignments is too great to permit marginal pair probabilities of 0.5 or greater. Figure 6 gives an example of the heat map alignment display for a human/rodent intergenic sequence pair (the region upstream of the MYL2 gene). The red-to-green coloring of aligned regions allows quick distinction of areas of alignment of high versus low confidence.


Measuring global credibility with application to local sequence alignment.

Webb-Robertson BJ, McCue LA, Lawrence CE - PLoS Comput. Biol. (2008)

Heat-map alignment representation of the EC.Sequence indices are given on the left and the color gradient associated with aligned residue probabilities is given on the right. Sequence regions that have no aligned pairs with a probability greater than 0.5 are ignored by the alignment, grayed out, and aligned with a dot to differentiate these from insertion/deletion events that utilize a dash.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2367447&req=5

pcbi-1000077-g006: Heat-map alignment representation of the EC.Sequence indices are given on the left and the color gradient associated with aligned residue probabilities is given on the right. Sequence regions that have no aligned pairs with a probability greater than 0.5 are ignored by the alignment, grayed out, and aligned with a dot to differentiate these from insertion/deletion events that utilize a dash.
Mentions: The use of heat maps or other means to visually illustrate confidence in the individual alignment of individual pairs of bases must accommodate a different feature for centroid alignments. Specifically, EC alignments have a feature not present in standard alignments, in that they allow stretches of sequence in the middle of an alignment to remain unaligned in a manner analogous to those regions at the ends of local alignments. That is, a residue in one sequence that cannot be reliably aligned with any single residue in the other sequence is excluded from the centroid alignment. Aligning any such residues to any bases in the other sequence would only increase the average distance of the centroid alignment from the posterior distribution of alignments. In addition, with probabilistic alignment, we return marginal probabilities of all residue pairs. Therefore, to display all the features of this alignment, we employ 1) a traditional dash to represent gaps, 2) a dot to represent residues that cannot be reliably aligned and are thus ignored in the alignment, and 3) a gradient color scheme (i.e., a heat map) to show the base pair alignment probabilities, where red indicates high probability for that residue pair, green indicates probabilities nearing 50%, and the ignored region is grayed out to further differentiate those residues for which the variability in alignments is too great to permit marginal pair probabilities of 0.5 or greater. Figure 6 gives an example of the heat map alignment display for a human/rodent intergenic sequence pair (the region upstream of the MYL2 gene). The red-to-green coloring of aligned regions allows quick distinction of areas of alignment of high versus low confidence.

Bottom Line: Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete.The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators.Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, Washington, United States of America. bj@pnl.gov

ABSTRACT
Computational biology is replete with high-dimensional (high-D) discrete prediction and inference problems, including sequence alignment, RNA structure prediction, phylogenetic inference, motif finding, prediction of pathways, and model selection problems in statistical genetics. Even though prediction and inference in these settings are uncertain, little attention has been focused on the development of global measures of uncertainty. Regardless of the procedure employed to produce a prediction, when a procedure delivers a single answer, that answer is a point estimate selected from the solution ensemble, the set of all possible solutions. For high-D discrete space, these ensembles are immense, and thus there is considerable uncertainty. We recommend the use of Bayesian credibility limits to describe this uncertainty, where a (1-alpha)%, 0< or =alpha< or =1, credibility limit is the minimum Hamming distance radius of a hyper-sphere containing (1-alpha)% of the posterior distribution. Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete. The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators. Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments.

Show MeSH
Related in: MedlinePlus