Limits...
Measuring global credibility with application to local sequence alignment.

Webb-Robertson BJ, McCue LA, Lawrence CE - PLoS Comput. Biol. (2008)

Bottom Line: Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete.The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators.Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, Washington, United States of America. bj@pnl.gov

ABSTRACT
Computational biology is replete with high-dimensional (high-D) discrete prediction and inference problems, including sequence alignment, RNA structure prediction, phylogenetic inference, motif finding, prediction of pathways, and model selection problems in statistical genetics. Even though prediction and inference in these settings are uncertain, little attention has been focused on the development of global measures of uncertainty. Regardless of the procedure employed to produce a prediction, when a procedure delivers a single answer, that answer is a point estimate selected from the solution ensemble, the set of all possible solutions. For high-D discrete space, these ensembles are immense, and thus there is considerable uncertainty. We recommend the use of Bayesian credibility limits to describe this uncertainty, where a (1-alpha)%, 0< or =alpha< or =1, credibility limit is the minimum Hamming distance radius of a hyper-sphere containing (1-alpha)% of the posterior distribution. Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete. The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators. Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments.

Show MeSH

Related in: MedlinePlus

Histograms of the distances of the sampled alignments from the EC and MS for the intergenic regions upstream of orthologous genes from SMR4 and CN32.(A) Alignment distribution for the regions upstream of the orthologous genes SMR4_0576 and CN32_3301 and (B) alignment distribution for the orthologous regions upstream of the arginine decarboxylase (speA) genes SMR4_1557 and CN32_1647.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2367447&req=5

pcbi-1000077-g005: Histograms of the distances of the sampled alignments from the EC and MS for the intergenic regions upstream of orthologous genes from SMR4 and CN32.(A) Alignment distribution for the regions upstream of the orthologous genes SMR4_0576 and CN32_3301 and (B) alignment distribution for the orthologous regions upstream of the arginine decarboxylase (speA) genes SMR4_1557 and CN32_1647.

Mentions: We further evaluated the findings shown in Figure 3 in the context of a single gene's orthologous upstream sequences. Often in evaluating promoter sequences across species it is unknown a priori which sequences it would be most beneficial to align. The tight credibility limits shown in Figure 4A and 4B indicate that when evaluating the promoter region of SMR4_0576, we would have confidence in the alignments with the orthologous region from SONE and CN32 (also with SRM7, data not shown). This is not the case for the orthologous regions from SPV4 and DENI. The high ND95 values for the EC and MS alignments indicate that alignment of SPV4 or DENI sequences would not contribute to a meaningful evaluation of the SMR4_0576 promoter region. Unfortunately, not all alignments of promoter regions from SMR4 with the promoter sequences of orthologous genes in SONE and CN32 are reliable. For example, as Figure 5 shows, the posterior distribution of the alignments of the SMR4_ 1557 promoter region with its CN32 ortholog is substantially more widespread and variable than the posterior distribution of alignments for the promoter region of SMR4_0576 with its orthologous region in CN32.


Measuring global credibility with application to local sequence alignment.

Webb-Robertson BJ, McCue LA, Lawrence CE - PLoS Comput. Biol. (2008)

Histograms of the distances of the sampled alignments from the EC and MS for the intergenic regions upstream of orthologous genes from SMR4 and CN32.(A) Alignment distribution for the regions upstream of the orthologous genes SMR4_0576 and CN32_3301 and (B) alignment distribution for the orthologous regions upstream of the arginine decarboxylase (speA) genes SMR4_1557 and CN32_1647.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2367447&req=5

pcbi-1000077-g005: Histograms of the distances of the sampled alignments from the EC and MS for the intergenic regions upstream of orthologous genes from SMR4 and CN32.(A) Alignment distribution for the regions upstream of the orthologous genes SMR4_0576 and CN32_3301 and (B) alignment distribution for the orthologous regions upstream of the arginine decarboxylase (speA) genes SMR4_1557 and CN32_1647.
Mentions: We further evaluated the findings shown in Figure 3 in the context of a single gene's orthologous upstream sequences. Often in evaluating promoter sequences across species it is unknown a priori which sequences it would be most beneficial to align. The tight credibility limits shown in Figure 4A and 4B indicate that when evaluating the promoter region of SMR4_0576, we would have confidence in the alignments with the orthologous region from SONE and CN32 (also with SRM7, data not shown). This is not the case for the orthologous regions from SPV4 and DENI. The high ND95 values for the EC and MS alignments indicate that alignment of SPV4 or DENI sequences would not contribute to a meaningful evaluation of the SMR4_0576 promoter region. Unfortunately, not all alignments of promoter regions from SMR4 with the promoter sequences of orthologous genes in SONE and CN32 are reliable. For example, as Figure 5 shows, the posterior distribution of the alignments of the SMR4_ 1557 promoter region with its CN32 ortholog is substantially more widespread and variable than the posterior distribution of alignments for the promoter region of SMR4_0576 with its orthologous region in CN32.

Bottom Line: Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete.The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators.Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, Washington, United States of America. bj@pnl.gov

ABSTRACT
Computational biology is replete with high-dimensional (high-D) discrete prediction and inference problems, including sequence alignment, RNA structure prediction, phylogenetic inference, motif finding, prediction of pathways, and model selection problems in statistical genetics. Even though prediction and inference in these settings are uncertain, little attention has been focused on the development of global measures of uncertainty. Regardless of the procedure employed to produce a prediction, when a procedure delivers a single answer, that answer is a point estimate selected from the solution ensemble, the set of all possible solutions. For high-D discrete space, these ensembles are immense, and thus there is considerable uncertainty. We recommend the use of Bayesian credibility limits to describe this uncertainty, where a (1-alpha)%, 0< or =alpha< or =1, credibility limit is the minimum Hamming distance radius of a hyper-sphere containing (1-alpha)% of the posterior distribution. Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete. The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators. Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments.

Show MeSH
Related in: MedlinePlus