Limits...
Statistical power of phylo-HMM for evolutionarily conserved element detection.

Fan X, Zhu J, Schadt EE, Liu JS - BMC Bioinformatics (2007)

Bottom Line: In addition, the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors.In contrast, the influence of the topology and the nucleotide substitution model are relatively minor factors.Our results provide for general guidelines on how to select the number of genomes and their evolutionary distance in comparative genomics studies, as well as the level of power we can expect under different parameter settings.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics, Harvard University, Boston, MA, USA. xfan@fas.harvard.edu

ABSTRACT

Background: An important goal of comparative genomics is the identification of functional elements through conservation analysis. Phylo-HMM was recently introduced to detect conserved elements based on multiple genome alignments, but the method has not been rigorously evaluated.

Results: We report here a simulation study to investigate the power of phylo-HMM. We show that the power of the phylo-HMM approach depends on many factors, the most important being the number of species-specific genomes used and evolutionary distances between pairs of species. This finding is consistent with results reported by other groups for simpler comparative genomics models. In addition, the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors. In contrast, the influence of the topology and the nucleotide substitution model are relatively minor factors.

Conclusion: Our results provide for general guidelines on how to select the number of genomes and their evolutionary distance in comparative genomics studies, as well as the level of power we can expect under different parameter settings.

Show MeSH

Related in: MedlinePlus

Influence of the number of genomes for the symmetric star topology. This graph illustrates the relationship between one over the number of genomes and the individual branch lengths. The specificity is fixed at 0.95 in all cases. Each curve corresponds to a given sensitivity (Sn) level as illustrated in the legend.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2194792&req=5

Figure 6: Influence of the number of genomes for the symmetric star topology. This graph illustrates the relationship between one over the number of genomes and the individual branch lengths. The specificity is fixed at 0.95 in all cases. Each curve corresponds to a given sensitivity (Sn) level as illustrated in the legend.

Mentions: The number of species to choose is yet another important problem to consider in performing comparative genomics studies. Based on conclusions from previous sections, we investigated this problem using a symmetric star topology and the common baseline settings, with the exception of the branch lengths (Figure 6). Assuming each branch is as long as the distance between mouse and rat, 6 genomes are needed to achieve a sensitivity of 0.90 at a specificity of 0.95. Adding 4 more genomes increases the sensitivity to 0.95 at this same specificity. For shorter branch lengths (e.g, < 0.2, like the distance between mouse and rat), the number of genomes required to achieve a desired level of power scales inversely with the individual branch length. This scaling property of phylo-HMM is similar to that established for the individual alignment block model [9].


Statistical power of phylo-HMM for evolutionarily conserved element detection.

Fan X, Zhu J, Schadt EE, Liu JS - BMC Bioinformatics (2007)

Influence of the number of genomes for the symmetric star topology. This graph illustrates the relationship between one over the number of genomes and the individual branch lengths. The specificity is fixed at 0.95 in all cases. Each curve corresponds to a given sensitivity (Sn) level as illustrated in the legend.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2194792&req=5

Figure 6: Influence of the number of genomes for the symmetric star topology. This graph illustrates the relationship between one over the number of genomes and the individual branch lengths. The specificity is fixed at 0.95 in all cases. Each curve corresponds to a given sensitivity (Sn) level as illustrated in the legend.
Mentions: The number of species to choose is yet another important problem to consider in performing comparative genomics studies. Based on conclusions from previous sections, we investigated this problem using a symmetric star topology and the common baseline settings, with the exception of the branch lengths (Figure 6). Assuming each branch is as long as the distance between mouse and rat, 6 genomes are needed to achieve a sensitivity of 0.90 at a specificity of 0.95. Adding 4 more genomes increases the sensitivity to 0.95 at this same specificity. For shorter branch lengths (e.g, < 0.2, like the distance between mouse and rat), the number of genomes required to achieve a desired level of power scales inversely with the individual branch length. This scaling property of phylo-HMM is similar to that established for the individual alignment block model [9].

Bottom Line: In addition, the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors.In contrast, the influence of the topology and the nucleotide substitution model are relatively minor factors.Our results provide for general guidelines on how to select the number of genomes and their evolutionary distance in comparative genomics studies, as well as the level of power we can expect under different parameter settings.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics, Harvard University, Boston, MA, USA. xfan@fas.harvard.edu

ABSTRACT

Background: An important goal of comparative genomics is the identification of functional elements through conservation analysis. Phylo-HMM was recently introduced to detect conserved elements based on multiple genome alignments, but the method has not been rigorously evaluated.

Results: We report here a simulation study to investigate the power of phylo-HMM. We show that the power of the phylo-HMM approach depends on many factors, the most important being the number of species-specific genomes used and evolutionary distances between pairs of species. This finding is consistent with results reported by other groups for simpler comparative genomics models. In addition, the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors. In contrast, the influence of the topology and the nucleotide substitution model are relatively minor factors.

Conclusion: Our results provide for general guidelines on how to select the number of genomes and their evolutionary distance in comparative genomics studies, as well as the level of power we can expect under different parameter settings.

Show MeSH
Related in: MedlinePlus