Limits...
A human genome-wide library of local phylogeny predictions for whole-genome inference problems.

Sridhar S, Schwartz R - BMC Genomics (2008)

Bottom Line: This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history.As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history.This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Sciences, Carnegie Mellon University, USA. srinath@cs.cmu.edu

ABSTRACT

Background: Many common inference problems in computational genetics depend on inferring aspects of the evolutionary history of a data set given a set of observed modern sequences. Detailed predictions of the full phylogenies are therefore of value in improving our ability to make further inferences about population history and sources of genetic variation. Making phylogenetic predictions on the scale needed for whole-genome analysis is, however, extremely computationally demanding.

Results: In order to facilitate phylogeny-based predictions on a genomic scale, we develop a library of maximum parsimony phylogenies within local regions spanning all autosomal human chromosomes based on Haplotype Map variation data. We demonstrate the utility of this library for population genetic inferences by examining a tree statistic we call 'imperfection,' which measures the reuse of variant sites within a phylogeny. This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history.

Conclusion: Recent theoretical advances in algorithms for phylogenetic tree reconstruction have made it possible to perform large-scale inferences of local maximum parsimony phylogenies from single nucleotide polymorphism (SNP) data. As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history. This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation.

Show MeSH
Coincidence of imperfection and fine-scale recombination rate for chromosome 21. Imperfection scores are shown as solid grey bars mapped to the position of the central SNP of the corresponding window. Fine-scale recombination rates, supplied by the HapMap web site [2] are marked by dashed black lines. CEU data appear above the x-axis and YRI data below the x-axis.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2556685&req=5

Figure 4: Coincidence of imperfection and fine-scale recombination rate for chromosome 21. Imperfection scores are shown as solid grey bars mapped to the position of the central SNP of the corresponding window. Fine-scale recombination rates, supplied by the HapMap web site [2] are marked by dashed black lines. CEU data appear above the x-axis and YRI data below the x-axis.

Mentions: We next sought to demonstrate how a library of inferred phylogenies would be useful in making genome-scale predictions of genomic features that indirectly depend on local evolutionary histories. We chose the example of fine-scale recombination rate, continuing with the imperfection statistic as a hypothetical predictor of that rate. Recombination rate might be expected to correlate with phylogeny size because recombination events will be misinterpreted as multiple recurrent mutation events and the imperfection statistic should therefore tend to be large where recombination has been frequent. While we do not have access to the ground truth for recombination rate, we can test our ability to predict an accepted inference of the recombination rate that was performed for the HapMap by the method of MacVean et al. [4,35]. Figure 4(a) illustrates the correlation between local imperfection and the previously inferred fine-scale recombination rates for chromosome 21. We chose chromosome 21 for visualization purposes as it is small enough that fine-scale features can still be discerned in a whole-chromosome plot. The image reveals that spikes in local recombination rate do generally correspond with spikes in local phylogenetic imperfection. Conversely, regions of sustained low recombination rate, such as that observed around 28 Mb do appear to correspond to generally low imperfection. Nonetheless, many peaks in phylogenetic imperfection coincide with low inferred recombination rates. Figure 4(b) just examines the windows that fall outside recombination hotspots, further showing that high imperfection can occur at regions of low recombination rates. This observation suggests that the phylogenetic imperfection measure detects both recombination and other sources of large phylogenies, most likely recurrent mutation.


A human genome-wide library of local phylogeny predictions for whole-genome inference problems.

Sridhar S, Schwartz R - BMC Genomics (2008)

Coincidence of imperfection and fine-scale recombination rate for chromosome 21. Imperfection scores are shown as solid grey bars mapped to the position of the central SNP of the corresponding window. Fine-scale recombination rates, supplied by the HapMap web site [2] are marked by dashed black lines. CEU data appear above the x-axis and YRI data below the x-axis.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2556685&req=5

Figure 4: Coincidence of imperfection and fine-scale recombination rate for chromosome 21. Imperfection scores are shown as solid grey bars mapped to the position of the central SNP of the corresponding window. Fine-scale recombination rates, supplied by the HapMap web site [2] are marked by dashed black lines. CEU data appear above the x-axis and YRI data below the x-axis.
Mentions: We next sought to demonstrate how a library of inferred phylogenies would be useful in making genome-scale predictions of genomic features that indirectly depend on local evolutionary histories. We chose the example of fine-scale recombination rate, continuing with the imperfection statistic as a hypothetical predictor of that rate. Recombination rate might be expected to correlate with phylogeny size because recombination events will be misinterpreted as multiple recurrent mutation events and the imperfection statistic should therefore tend to be large where recombination has been frequent. While we do not have access to the ground truth for recombination rate, we can test our ability to predict an accepted inference of the recombination rate that was performed for the HapMap by the method of MacVean et al. [4,35]. Figure 4(a) illustrates the correlation between local imperfection and the previously inferred fine-scale recombination rates for chromosome 21. We chose chromosome 21 for visualization purposes as it is small enough that fine-scale features can still be discerned in a whole-chromosome plot. The image reveals that spikes in local recombination rate do generally correspond with spikes in local phylogenetic imperfection. Conversely, regions of sustained low recombination rate, such as that observed around 28 Mb do appear to correspond to generally low imperfection. Nonetheless, many peaks in phylogenetic imperfection coincide with low inferred recombination rates. Figure 4(b) just examines the windows that fall outside recombination hotspots, further showing that high imperfection can occur at regions of low recombination rates. This observation suggests that the phylogenetic imperfection measure detects both recombination and other sources of large phylogenies, most likely recurrent mutation.

Bottom Line: This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history.As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history.This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Sciences, Carnegie Mellon University, USA. srinath@cs.cmu.edu

ABSTRACT

Background: Many common inference problems in computational genetics depend on inferring aspects of the evolutionary history of a data set given a set of observed modern sequences. Detailed predictions of the full phylogenies are therefore of value in improving our ability to make further inferences about population history and sources of genetic variation. Making phylogenetic predictions on the scale needed for whole-genome analysis is, however, extremely computationally demanding.

Results: In order to facilitate phylogeny-based predictions on a genomic scale, we develop a library of maximum parsimony phylogenies within local regions spanning all autosomal human chromosomes based on Haplotype Map variation data. We demonstrate the utility of this library for population genetic inferences by examining a tree statistic we call 'imperfection,' which measures the reuse of variant sites within a phylogeny. This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history.

Conclusion: Recent theoretical advances in algorithms for phylogenetic tree reconstruction have made it possible to perform large-scale inferences of local maximum parsimony phylogenies from single nucleotide polymorphism (SNP) data. As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history. This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation.

Show MeSH