Limits...
Statistical power of phylo-HMM for evolutionarily conserved element detection.

Fan X, Zhu J, Schadt EE, Liu JS - BMC Bioinformatics (2007)

Bottom Line: In addition, the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors.In contrast, the influence of the topology and the nucleotide substitution model are relatively minor factors.Our results provide for general guidelines on how to select the number of genomes and their evolutionary distance in comparative genomics studies, as well as the level of power we can expect under different parameter settings.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics, Harvard University, Boston, MA, USA. xfan@fas.harvard.edu

ABSTRACT

Background: An important goal of comparative genomics is the identification of functional elements through conservation analysis. Phylo-HMM was recently introduced to detect conserved elements based on multiple genome alignments, but the method has not been rigorously evaluated.

Results: We report here a simulation study to investigate the power of phylo-HMM. We show that the power of the phylo-HMM approach depends on many factors, the most important being the number of species-specific genomes used and evolutionary distances between pairs of species. This finding is consistent with results reported by other groups for simpler comparative genomics models. In addition, the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors. In contrast, the influence of the topology and the nucleotide substitution model are relatively minor factors.

Conclusion: Our results provide for general guidelines on how to select the number of genomes and their evolutionary distance in comparative genomics studies, as well as the level of power we can expect under different parameter settings.

Show MeSH

Related in: MedlinePlus

The relationship between the column score and branch length. (A) Relationships for branches at different locations in the baseline phylogenetic tree. The different lines represent the different branches as illustrated in the legend. (B) Relationships for branches in the symmetric star-topology tree. The different lines correspond to the different numbers of genomes (n) represented by the tree.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2194792&req=5

Figure 10: The relationship between the column score and branch length. (A) Relationships for branches at different locations in the baseline phylogenetic tree. The different lines represent the different branches as illustrated in the legend. (B) Relationships for branches in the symmetric star-topology tree. The different lines correspond to the different numbers of genomes (n) represented by the tree.

Mentions: We evaluated the alignment accuracy for all situations reported in the previous sections. The column score was higher than 0.99 for the baseline case and in all cases where the topology, substitution rate model, HMM parameters, and the conservation ratio were varied. Figure 10 shows the influence of branch length and the number of genomes. For the baseline phylogenetic tree, the column score decreases below 0.99 if any of the leaf branches is longer than 1 substitution per site or the middle branch is longer than 0.6 substitutions per site. For the symmetric star-topology tree, if the number of genomes is 4, the column score decreases below 0.99 if the single branch length is longer than 0.5 substitutions per site. As the number of genomes increases, the column score decreases below 0.99 at shorter branch lengths, which implies it is harder to recover the true alignment if the number of genomes increases. All of these results suggest that no branch length should be longer than 1 substitution per site (e.g., the distance between dog and rat) in order to realize an accurate alignment. If no branch is too long, which holds around the baseline setting, the assumption that a highly accurate alignment is available is valid.


Statistical power of phylo-HMM for evolutionarily conserved element detection.

Fan X, Zhu J, Schadt EE, Liu JS - BMC Bioinformatics (2007)

The relationship between the column score and branch length. (A) Relationships for branches at different locations in the baseline phylogenetic tree. The different lines represent the different branches as illustrated in the legend. (B) Relationships for branches in the symmetric star-topology tree. The different lines correspond to the different numbers of genomes (n) represented by the tree.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2194792&req=5

Figure 10: The relationship between the column score and branch length. (A) Relationships for branches at different locations in the baseline phylogenetic tree. The different lines represent the different branches as illustrated in the legend. (B) Relationships for branches in the symmetric star-topology tree. The different lines correspond to the different numbers of genomes (n) represented by the tree.
Mentions: We evaluated the alignment accuracy for all situations reported in the previous sections. The column score was higher than 0.99 for the baseline case and in all cases where the topology, substitution rate model, HMM parameters, and the conservation ratio were varied. Figure 10 shows the influence of branch length and the number of genomes. For the baseline phylogenetic tree, the column score decreases below 0.99 if any of the leaf branches is longer than 1 substitution per site or the middle branch is longer than 0.6 substitutions per site. For the symmetric star-topology tree, if the number of genomes is 4, the column score decreases below 0.99 if the single branch length is longer than 0.5 substitutions per site. As the number of genomes increases, the column score decreases below 0.99 at shorter branch lengths, which implies it is harder to recover the true alignment if the number of genomes increases. All of these results suggest that no branch length should be longer than 1 substitution per site (e.g., the distance between dog and rat) in order to realize an accurate alignment. If no branch is too long, which holds around the baseline setting, the assumption that a highly accurate alignment is available is valid.

Bottom Line: In addition, the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors.In contrast, the influence of the topology and the nucleotide substitution model are relatively minor factors.Our results provide for general guidelines on how to select the number of genomes and their evolutionary distance in comparative genomics studies, as well as the level of power we can expect under different parameter settings.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics, Harvard University, Boston, MA, USA. xfan@fas.harvard.edu

ABSTRACT

Background: An important goal of comparative genomics is the identification of functional elements through conservation analysis. Phylo-HMM was recently introduced to detect conserved elements based on multiple genome alignments, but the method has not been rigorously evaluated.

Results: We report here a simulation study to investigate the power of phylo-HMM. We show that the power of the phylo-HMM approach depends on many factors, the most important being the number of species-specific genomes used and evolutionary distances between pairs of species. This finding is consistent with results reported by other groups for simpler comparative genomics models. In addition, the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors. In contrast, the influence of the topology and the nucleotide substitution model are relatively minor factors.

Conclusion: Our results provide for general guidelines on how to select the number of genomes and their evolutionary distance in comparative genomics studies, as well as the level of power we can expect under different parameter settings.

Show MeSH
Related in: MedlinePlus