Limits...
A human genome-wide library of local phylogeny predictions for whole-genome inference problems.

Sridhar S, Schwartz R - BMC Genomics (2008)

Bottom Line: This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history.As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history.This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Sciences, Carnegie Mellon University, USA. srinath@cs.cmu.edu

ABSTRACT

Background: Many common inference problems in computational genetics depend on inferring aspects of the evolutionary history of a data set given a set of observed modern sequences. Detailed predictions of the full phylogenies are therefore of value in improving our ability to make further inferences about population history and sources of genetic variation. Making phylogenetic predictions on the scale needed for whole-genome analysis is, however, extremely computationally demanding.

Results: In order to facilitate phylogeny-based predictions on a genomic scale, we develop a library of maximum parsimony phylogenies within local regions spanning all autosomal human chromosomes based on Haplotype Map variation data. We demonstrate the utility of this library for population genetic inferences by examining a tree statistic we call 'imperfection,' which measures the reuse of variant sites within a phylogeny. This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history.

Conclusion: Recent theoretical advances in algorithms for phylogenetic tree reconstruction have made it possible to perform large-scale inferences of local maximum parsimony phylogenies from single nucleotide polymorphism (SNP) data. As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history. This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation.

Show MeSH
Histograms of imperfection for repetitive regions versus all windows. Each plot shows a comparison of windows centered on any repetitive region (solid line and '+'), windows centered on short tandem repeats (long dash and 'x'), and all windows (medium dash and '*'). (a) Data from CEU. (b) Data from YRI.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2556685&req=5

Figure 7: Histograms of imperfection for repetitive regions versus all windows. Each plot shows a comparison of windows centered on any repetitive region (solid line and '+'), windows centered on short tandem repeats (long dash and 'x'), and all windows (medium dash and '*'). (a) Data from CEU. (b) Data from YRI.

Mentions: A bias in imperfection scores might also be expected for SNPs found in repetitive regions of the genome. We might anticipate some excess of imperfection in this set from a greater frequency of genotyping errors or genome misassembly around repetitive elements. We might also anticipate a higher fraction of large imperfection scores due to genuine hypermutable sites, which are known to be associated with some short tandem repeat (STR) regions [36,37]. We therefore compared the set of all windows with those whose central SNP falls in any repetitive region. We also separately examined windows whose central SNPs overlap STR regions. Figures 7(a) and 7(b) show the results for CEU and YRI populations. The graphs show hardly any differences between the data sets for the well-populated imperfection values. Comparing all repeat windows versus all windows, we find that the frequency of perfect windows is nearly identical (77.6% versus 77.9% for CEU and 65.1% in both data-sets for YRI). STR SNPs also do not show pronounced differences from general windows, although they are slightly less likely to be perfect (76.1% versus 77.9% for CEU and 62.3% versus 65.1% for YRI). It therefore appears that repetitive elements do not lead to any dramatic systemic bias in local phylogenetic imperfection.


A human genome-wide library of local phylogeny predictions for whole-genome inference problems.

Sridhar S, Schwartz R - BMC Genomics (2008)

Histograms of imperfection for repetitive regions versus all windows. Each plot shows a comparison of windows centered on any repetitive region (solid line and '+'), windows centered on short tandem repeats (long dash and 'x'), and all windows (medium dash and '*'). (a) Data from CEU. (b) Data from YRI.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2556685&req=5

Figure 7: Histograms of imperfection for repetitive regions versus all windows. Each plot shows a comparison of windows centered on any repetitive region (solid line and '+'), windows centered on short tandem repeats (long dash and 'x'), and all windows (medium dash and '*'). (a) Data from CEU. (b) Data from YRI.
Mentions: A bias in imperfection scores might also be expected for SNPs found in repetitive regions of the genome. We might anticipate some excess of imperfection in this set from a greater frequency of genotyping errors or genome misassembly around repetitive elements. We might also anticipate a higher fraction of large imperfection scores due to genuine hypermutable sites, which are known to be associated with some short tandem repeat (STR) regions [36,37]. We therefore compared the set of all windows with those whose central SNP falls in any repetitive region. We also separately examined windows whose central SNPs overlap STR regions. Figures 7(a) and 7(b) show the results for CEU and YRI populations. The graphs show hardly any differences between the data sets for the well-populated imperfection values. Comparing all repeat windows versus all windows, we find that the frequency of perfect windows is nearly identical (77.6% versus 77.9% for CEU and 65.1% in both data-sets for YRI). STR SNPs also do not show pronounced differences from general windows, although they are slightly less likely to be perfect (76.1% versus 77.9% for CEU and 62.3% versus 65.1% for YRI). It therefore appears that repetitive elements do not lead to any dramatic systemic bias in local phylogenetic imperfection.

Bottom Line: This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history.As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history.This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Sciences, Carnegie Mellon University, USA. srinath@cs.cmu.edu

ABSTRACT

Background: Many common inference problems in computational genetics depend on inferring aspects of the evolutionary history of a data set given a set of observed modern sequences. Detailed predictions of the full phylogenies are therefore of value in improving our ability to make further inferences about population history and sources of genetic variation. Making phylogenetic predictions on the scale needed for whole-genome analysis is, however, extremely computationally demanding.

Results: In order to facilitate phylogeny-based predictions on a genomic scale, we develop a library of maximum parsimony phylogenies within local regions spanning all autosomal human chromosomes based on Haplotype Map variation data. We demonstrate the utility of this library for population genetic inferences by examining a tree statistic we call 'imperfection,' which measures the reuse of variant sites within a phylogeny. This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history.

Conclusion: Recent theoretical advances in algorithms for phylogenetic tree reconstruction have made it possible to perform large-scale inferences of local maximum parsimony phylogenies from single nucleotide polymorphism (SNP) data. As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history. This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation.

Show MeSH