Limits...
Statistical power of phylo-HMM for evolutionarily conserved element detection.

Fan X, Zhu J, Schadt EE, Liu JS - BMC Bioinformatics (2007)

Bottom Line: In addition, the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors.In contrast, the influence of the topology and the nucleotide substitution model are relatively minor factors.Our results provide for general guidelines on how to select the number of genomes and their evolutionary distance in comparative genomics studies, as well as the level of power we can expect under different parameter settings.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics, Harvard University, Boston, MA, USA. xfan@fas.harvard.edu

ABSTRACT

Background: An important goal of comparative genomics is the identification of functional elements through conservation analysis. Phylo-HMM was recently introduced to detect conserved elements based on multiple genome alignments, but the method has not been rigorously evaluated.

Results: We report here a simulation study to investigate the power of phylo-HMM. We show that the power of the phylo-HMM approach depends on many factors, the most important being the number of species-specific genomes used and evolutionary distances between pairs of species. This finding is consistent with results reported by other groups for simpler comparative genomics models. In addition, the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors. In contrast, the influence of the topology and the nucleotide substitution model are relatively minor factors.

Conclusion: Our results provide for general guidelines on how to select the number of genomes and their evolutionary distance in comparative genomics studies, as well as the level of power we can expect under different parameter settings.

Show MeSH

Related in: MedlinePlus

Influence of substitution model type. This graph compares the power of phylo-HMM for (A) different substitution model types (JC, F81 with baseline π, HKY with kappa = 4 and baseline π, REV with baseline π and rate matrix), and (B) simulations carried out using the REV model and then estimations carried out using the JC and REV models. The curves again represent the ROC curves as defined in the legend. The crosses highlight points corresponding to the 0.5 posterior probability threshold. The small solid line crosses in (B) show the 95% bootstrap confidence interval of the median sensitivity and specificity. Other crosses show the 1st-to-3rd quartile range.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2194792&req=5

Figure 7: Influence of substitution model type. This graph compares the power of phylo-HMM for (A) different substitution model types (JC, F81 with baseline π, HKY with kappa = 4 and baseline π, REV with baseline π and rate matrix), and (B) simulations carried out using the REV model and then estimations carried out using the JC and REV models. The curves again represent the ROC curves as defined in the legend. The crosses highlight points corresponding to the 0.5 posterior probability threshold. The small solid line crosses in (B) show the 95% bootstrap confidence interval of the median sensitivity and specificity. Other crosses show the 1st-to-3rd quartile range.

Mentions: The first experiment involved comparing the power of various model types. To compare them on a common ground, we used the same parameter values as in the baseline, with the exception of the substitution rate matrix for each model type. For JC, we used the uniform background nucleotide probability. For HKY, we set kappa = 4. For simulated sequences from each model type, we used the corresponding true model type to infer the state sequence. We went through the simulation-prediction procedure 1000 times to get the ROC curve for each model type. The results are depicted in Figure 7(A) and show that the simpler models (JC and F81) performed slightly better than the more complex models (HKY and REV), which agrees with our intuition that simpler models are easier to solve. At the parameter values set by mimicking these 8533 alignments, however, the observed differences are small.


Statistical power of phylo-HMM for evolutionarily conserved element detection.

Fan X, Zhu J, Schadt EE, Liu JS - BMC Bioinformatics (2007)

Influence of substitution model type. This graph compares the power of phylo-HMM for (A) different substitution model types (JC, F81 with baseline π, HKY with kappa = 4 and baseline π, REV with baseline π and rate matrix), and (B) simulations carried out using the REV model and then estimations carried out using the JC and REV models. The curves again represent the ROC curves as defined in the legend. The crosses highlight points corresponding to the 0.5 posterior probability threshold. The small solid line crosses in (B) show the 95% bootstrap confidence interval of the median sensitivity and specificity. Other crosses show the 1st-to-3rd quartile range.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2194792&req=5

Figure 7: Influence of substitution model type. This graph compares the power of phylo-HMM for (A) different substitution model types (JC, F81 with baseline π, HKY with kappa = 4 and baseline π, REV with baseline π and rate matrix), and (B) simulations carried out using the REV model and then estimations carried out using the JC and REV models. The curves again represent the ROC curves as defined in the legend. The crosses highlight points corresponding to the 0.5 posterior probability threshold. The small solid line crosses in (B) show the 95% bootstrap confidence interval of the median sensitivity and specificity. Other crosses show the 1st-to-3rd quartile range.
Mentions: The first experiment involved comparing the power of various model types. To compare them on a common ground, we used the same parameter values as in the baseline, with the exception of the substitution rate matrix for each model type. For JC, we used the uniform background nucleotide probability. For HKY, we set kappa = 4. For simulated sequences from each model type, we used the corresponding true model type to infer the state sequence. We went through the simulation-prediction procedure 1000 times to get the ROC curve for each model type. The results are depicted in Figure 7(A) and show that the simpler models (JC and F81) performed slightly better than the more complex models (HKY and REV), which agrees with our intuition that simpler models are easier to solve. At the parameter values set by mimicking these 8533 alignments, however, the observed differences are small.

Bottom Line: In addition, the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors.In contrast, the influence of the topology and the nucleotide substitution model are relatively minor factors.Our results provide for general guidelines on how to select the number of genomes and their evolutionary distance in comparative genomics studies, as well as the level of power we can expect under different parameter settings.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics, Harvard University, Boston, MA, USA. xfan@fas.harvard.edu

ABSTRACT

Background: An important goal of comparative genomics is the identification of functional elements through conservation analysis. Phylo-HMM was recently introduced to detect conserved elements based on multiple genome alignments, but the method has not been rigorously evaluated.

Results: We report here a simulation study to investigate the power of phylo-HMM. We show that the power of the phylo-HMM approach depends on many factors, the most important being the number of species-specific genomes used and evolutionary distances between pairs of species. This finding is consistent with results reported by other groups for simpler comparative genomics models. In addition, the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors. In contrast, the influence of the topology and the nucleotide substitution model are relatively minor factors.

Conclusion: Our results provide for general guidelines on how to select the number of genomes and their evolutionary distance in comparative genomics studies, as well as the level of power we can expect under different parameter settings.

Show MeSH
Related in: MedlinePlus