Limits...
Statistical power of phylo-HMM for evolutionarily conserved element detection.

Fan X, Zhu J, Schadt EE, Liu JS - BMC Bioinformatics (2007)

Bottom Line: In addition, the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors.In contrast, the influence of the topology and the nucleotide substitution model are relatively minor factors.Our results provide for general guidelines on how to select the number of genomes and their evolutionary distance in comparative genomics studies, as well as the level of power we can expect under different parameter settings.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics, Harvard University, Boston, MA, USA. xfan@fas.harvard.edu

ABSTRACT

Background: An important goal of comparative genomics is the identification of functional elements through conservation analysis. Phylo-HMM was recently introduced to detect conserved elements based on multiple genome alignments, but the method has not been rigorously evaluated.

Results: We report here a simulation study to investigate the power of phylo-HMM. We show that the power of the phylo-HMM approach depends on many factors, the most important being the number of species-specific genomes used and evolutionary distances between pairs of species. This finding is consistent with results reported by other groups for simpler comparative genomics models. In addition, the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors. In contrast, the influence of the topology and the nucleotide substitution model are relatively minor factors.

Conclusion: Our results provide for general guidelines on how to select the number of genomes and their evolutionary distance in comparative genomics studies, as well as the level of power we can expect under different parameter settings.

Show MeSH

Related in: MedlinePlus

Power comparison of phylo-HMM for different conservation ratio (ρ). (A) Comparing the whole ROC curves. The ROC curves shown are for different ρ as illustrated in the legend. The locations of the points correspond to a posterior probability threshold equal to 0.5. Some of the points are highlighted by crosses. The red dotted line crosses show the 1st-to-3rd quartile range. The black solid line crosses show the 95% bootstrap confidence interval of the median sensitivity and specificity. (B) The relationship between sensitivity and the conservation ratio at a fixed specificity equal to 0.9. The dots are where power was evaluated by simulation. The red whiskers at ρ = 0.3, 0.5, and ρ = 0.7 indicate the 95% bootstrap confidence interval for the median sensitivity. The red triangle indicates the power of the baseline, where ρ = 0.32.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2194792&req=5

Figure 9: Power comparison of phylo-HMM for different conservation ratio (ρ). (A) Comparing the whole ROC curves. The ROC curves shown are for different ρ as illustrated in the legend. The locations of the points correspond to a posterior probability threshold equal to 0.5. Some of the points are highlighted by crosses. The red dotted line crosses show the 1st-to-3rd quartile range. The black solid line crosses show the 95% bootstrap confidence interval of the median sensitivity and specificity. (B) The relationship between sensitivity and the conservation ratio at a fixed specificity equal to 0.9. The dots are where power was evaluated by simulation. The red whiskers at ρ = 0.3, 0.5, and ρ = 0.7 indicate the 95% bootstrap confidence interval for the median sensitivity. The red triangle indicates the power of the baseline, where ρ = 0.32.

Mentions: The conservation ratio (ρ), which is defined as the ratio of the average substitution rate of conserved sites over that of nonconserved sites, is one of the major factors determining the power of phylo-HMM. Figure 9 shows the power we can expect for a given conservation ratio under other baseline settings. For fixed specificity, the relationship between sensitivity and the conservation ratio is approximately a sigmoid type function: 1/(1 + e13(ρ-0.6)). The power decreases dramatically with increasing conservation ratio, especially when the conservation ratio is around 0.6. This poses a problem for the two-state phylo-HMM model, which assumes a uniform conservation ratio along a given alignment. A promoter region could be bound by several types of transcription factors, and each of these could have a different conservation ratio compared to the nonconserved background. In this case, the power evaluation from a uniform conservation ratio is questionable. This problem can be alleviated by introducing multiple rate classes to form a multiple-state phylo-HMM.


Statistical power of phylo-HMM for evolutionarily conserved element detection.

Fan X, Zhu J, Schadt EE, Liu JS - BMC Bioinformatics (2007)

Power comparison of phylo-HMM for different conservation ratio (ρ). (A) Comparing the whole ROC curves. The ROC curves shown are for different ρ as illustrated in the legend. The locations of the points correspond to a posterior probability threshold equal to 0.5. Some of the points are highlighted by crosses. The red dotted line crosses show the 1st-to-3rd quartile range. The black solid line crosses show the 95% bootstrap confidence interval of the median sensitivity and specificity. (B) The relationship between sensitivity and the conservation ratio at a fixed specificity equal to 0.9. The dots are where power was evaluated by simulation. The red whiskers at ρ = 0.3, 0.5, and ρ = 0.7 indicate the 95% bootstrap confidence interval for the median sensitivity. The red triangle indicates the power of the baseline, where ρ = 0.32.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2194792&req=5

Figure 9: Power comparison of phylo-HMM for different conservation ratio (ρ). (A) Comparing the whole ROC curves. The ROC curves shown are for different ρ as illustrated in the legend. The locations of the points correspond to a posterior probability threshold equal to 0.5. Some of the points are highlighted by crosses. The red dotted line crosses show the 1st-to-3rd quartile range. The black solid line crosses show the 95% bootstrap confidence interval of the median sensitivity and specificity. (B) The relationship between sensitivity and the conservation ratio at a fixed specificity equal to 0.9. The dots are where power was evaluated by simulation. The red whiskers at ρ = 0.3, 0.5, and ρ = 0.7 indicate the 95% bootstrap confidence interval for the median sensitivity. The red triangle indicates the power of the baseline, where ρ = 0.32.
Mentions: The conservation ratio (ρ), which is defined as the ratio of the average substitution rate of conserved sites over that of nonconserved sites, is one of the major factors determining the power of phylo-HMM. Figure 9 shows the power we can expect for a given conservation ratio under other baseline settings. For fixed specificity, the relationship between sensitivity and the conservation ratio is approximately a sigmoid type function: 1/(1 + e13(ρ-0.6)). The power decreases dramatically with increasing conservation ratio, especially when the conservation ratio is around 0.6. This poses a problem for the two-state phylo-HMM model, which assumes a uniform conservation ratio along a given alignment. A promoter region could be bound by several types of transcription factors, and each of these could have a different conservation ratio compared to the nonconserved background. In this case, the power evaluation from a uniform conservation ratio is questionable. This problem can be alleviated by introducing multiple rate classes to form a multiple-state phylo-HMM.

Bottom Line: In addition, the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors.In contrast, the influence of the topology and the nucleotide substitution model are relatively minor factors.Our results provide for general guidelines on how to select the number of genomes and their evolutionary distance in comparative genomics studies, as well as the level of power we can expect under different parameter settings.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics, Harvard University, Boston, MA, USA. xfan@fas.harvard.edu

ABSTRACT

Background: An important goal of comparative genomics is the identification of functional elements through conservation analysis. Phylo-HMM was recently introduced to detect conserved elements based on multiple genome alignments, but the method has not been rigorously evaluated.

Results: We report here a simulation study to investigate the power of phylo-HMM. We show that the power of the phylo-HMM approach depends on many factors, the most important being the number of species-specific genomes used and evolutionary distances between pairs of species. This finding is consistent with results reported by other groups for simpler comparative genomics models. In addition, the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors. In contrast, the influence of the topology and the nucleotide substitution model are relatively minor factors.

Conclusion: Our results provide for general guidelines on how to select the number of genomes and their evolutionary distance in comparative genomics studies, as well as the level of power we can expect under different parameter settings.

Show MeSH
Related in: MedlinePlus