Limits...
Measuring global credibility with application to local sequence alignment.

Webb-Robertson BJ, McCue LA, Lawrence CE - PLoS Comput. Biol. (2008)

Bottom Line: Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete.The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators.Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, Washington, United States of America. bj@pnl.gov

ABSTRACT
Computational biology is replete with high-dimensional (high-D) discrete prediction and inference problems, including sequence alignment, RNA structure prediction, phylogenetic inference, motif finding, prediction of pathways, and model selection problems in statistical genetics. Even though prediction and inference in these settings are uncertain, little attention has been focused on the development of global measures of uncertainty. Regardless of the procedure employed to produce a prediction, when a procedure delivers a single answer, that answer is a point estimate selected from the solution ensemble, the set of all possible solutions. For high-D discrete space, these ensembles are immense, and thus there is considerable uncertainty. We recommend the use of Bayesian credibility limits to describe this uncertainty, where a (1-alpha)%, 0< or =alpha< or =1, credibility limit is the minimum Hamming distance radius of a hyper-sphere containing (1-alpha)% of the posterior distribution. Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete. The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators. Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments.

Show MeSH

Related in: MedlinePlus

Plot of ND95 values for the EC versus the MS of 120 pairwise sequence alignments (24 comparisons for each of the five species in the legend to SMR4).The four example alignment ND distributions displayed in Figure 4 are indicated by a letter next to the corresponding symbol.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2367447&req=5

pcbi-1000077-g003: Plot of ND95 values for the EC versus the MS of 120 pairwise sequence alignments (24 comparisons for each of the five species in the legend to SMR4).The four example alignment ND distributions displayed in Figure 4 are indicated by a letter next to the corresponding symbol.

Mentions: We also examined the credibility limits for the MS and EC estimators for local alignments of orthologous pairs of intergenic regions (up to 500 bp upstream of orthologous genes) from six species of Shewanella for which full genome sequence data are available: 1) S. denitrificans OS217 (DENI), 2) S. loihica PV-4 (SPV4), 3) S. oneidensis MR-1 (SONE), 4) S. putrefaciens CN-32 (CN32), 5) Shewanella sp. MR-4 (SMR4), and 6) Shewanella sp. MR-7 (SMR7). We chose SMR4 as our base species, aligning orthologous sequences from each of the other five to the region from SMR4. Starting with SMR4, the species in order of increasing evolutionary distance are SMR4>SMR7>SONE>CN32>SPV4∼DENI. As before, we examined the 95% quantiles of the normalized distances, computed based on the distances between the estimating alignments and the sampled ensemble of alignments drawn from the posterior alignment distribution. Figure 3 shows a scatter plot of the MS ND95 versus the EC ND95 values for each of 24 randomly selected orthologous regions, for the pairwise comparison of SMR4 to each of the five species at varying evolutionary distances (120 total comparisons).


Measuring global credibility with application to local sequence alignment.

Webb-Robertson BJ, McCue LA, Lawrence CE - PLoS Comput. Biol. (2008)

Plot of ND95 values for the EC versus the MS of 120 pairwise sequence alignments (24 comparisons for each of the five species in the legend to SMR4).The four example alignment ND distributions displayed in Figure 4 are indicated by a letter next to the corresponding symbol.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2367447&req=5

pcbi-1000077-g003: Plot of ND95 values for the EC versus the MS of 120 pairwise sequence alignments (24 comparisons for each of the five species in the legend to SMR4).The four example alignment ND distributions displayed in Figure 4 are indicated by a letter next to the corresponding symbol.
Mentions: We also examined the credibility limits for the MS and EC estimators for local alignments of orthologous pairs of intergenic regions (up to 500 bp upstream of orthologous genes) from six species of Shewanella for which full genome sequence data are available: 1) S. denitrificans OS217 (DENI), 2) S. loihica PV-4 (SPV4), 3) S. oneidensis MR-1 (SONE), 4) S. putrefaciens CN-32 (CN32), 5) Shewanella sp. MR-4 (SMR4), and 6) Shewanella sp. MR-7 (SMR7). We chose SMR4 as our base species, aligning orthologous sequences from each of the other five to the region from SMR4. Starting with SMR4, the species in order of increasing evolutionary distance are SMR4>SMR7>SONE>CN32>SPV4∼DENI. As before, we examined the 95% quantiles of the normalized distances, computed based on the distances between the estimating alignments and the sampled ensemble of alignments drawn from the posterior alignment distribution. Figure 3 shows a scatter plot of the MS ND95 versus the EC ND95 values for each of 24 randomly selected orthologous regions, for the pairwise comparison of SMR4 to each of the five species at varying evolutionary distances (120 total comparisons).

Bottom Line: Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete.The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators.Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, Washington, United States of America. bj@pnl.gov

ABSTRACT
Computational biology is replete with high-dimensional (high-D) discrete prediction and inference problems, including sequence alignment, RNA structure prediction, phylogenetic inference, motif finding, prediction of pathways, and model selection problems in statistical genetics. Even though prediction and inference in these settings are uncertain, little attention has been focused on the development of global measures of uncertainty. Regardless of the procedure employed to produce a prediction, when a procedure delivers a single answer, that answer is a point estimate selected from the solution ensemble, the set of all possible solutions. For high-D discrete space, these ensembles are immense, and thus there is considerable uncertainty. We recommend the use of Bayesian credibility limits to describe this uncertainty, where a (1-alpha)%, 0< or =alpha< or =1, credibility limit is the minimum Hamming distance radius of a hyper-sphere containing (1-alpha)% of the posterior distribution. Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete. The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators. Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments.

Show MeSH
Related in: MedlinePlus