Limits...
8.2% of the Human genome is constrained: variation in rates of turnover across functional element classes in the human lineage.

Rands CM, Meader S, Ponting CP, Lunter G - PLoS Genet. (2014)

Bottom Line: While enriched with ENCODE biochemical annotations, much of the short-lived constrained sequences we identify are not detected by models optimized for wider pan-mammalian conservation.By contrast, protein coding sequence has been highly stable, with an estimated half-life of over a billion years (d1/2 = 2.1-5.0).These results reveal that the evolutionary history of the human genome has been highly dynamic, particularly for its noncoding yet biologically functional fraction.

View Article: PubMed Central - PubMed

Affiliation: MRC Functional Genomics Unit, Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford, United Kingdom.

ABSTRACT
Ten years on from the finishing of the human reference genome sequence, it remains unclear what fraction of the human genome confers function, where this sequence resides, and how much is shared with other mammalian species. When addressing these questions, functional sequence has often been equated with pan-mammalian conserved sequence. However, functional elements that are short-lived, including those contributing to species-specific biology, will not leave a footprint of long-lasting negative selection. Here, we address these issues by identifying and characterising sequence that has been constrained with respect to insertions and deletions for pairs of eutherian genomes over a range of divergences. Within noncoding sequence, we find increasing amounts of mutually constrained sequence as species pairs become more closely related, indicating that noncoding constrained sequence turns over rapidly. We estimate that half of present-day noncoding constrained sequence has been gained or lost in approximately the last 130 million years (half-life in units of divergence time, d1/2 = 0.25-0.31). While enriched with ENCODE biochemical annotations, much of the short-lived constrained sequences we identify are not detected by models optimized for wider pan-mammalian conservation. Constrained DNase 1 hypersensitivity sites, promoters and untranslated regions have been more evolutionarily stable than long noncoding RNA loci which have turned over especially rapidly. By contrast, protein coding sequence has been highly stable, with an estimated half-life of over a billion years (d1/2 = 2.1-5.0). From extrapolations we estimate that 8.2% (7.1-9.2%) of the human genome is presently subject to negative selection and thus is likely to be functional, while only 2.2% has maintained constraint in both human and mouse since these species diverged. These results reveal that the evolutionary history of the human genome has been highly dynamic, particularly for its noncoding yet biologically functional fraction.

Show MeSH

Related in: MedlinePlus

Evolutionary turnover of constrained sequence.A. Quantity of constrained sequence (αselIndel) estimated by NIM1 (blue bars) and NIM2 (red bars) plotted against ancestral repeat divergence for different pairs of eutherian species genomes, with the simulated data (grey) shown under a non-turnover scenario. B. Coding sequence (blue squares) is seen to be broadly conserved, while constrained noncoding sequence (orange circles) shows a strong negative correlation between αselIndel and divergence, indicating rapid turnover.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4109858&req=5

pgen-1004525-g001: Evolutionary turnover of constrained sequence.A. Quantity of constrained sequence (αselIndel) estimated by NIM1 (blue bars) and NIM2 (red bars) plotted against ancestral repeat divergence for different pairs of eutherian species genomes, with the simulated data (grey) shown under a non-turnover scenario. B. Coding sequence (blue squares) is seen to be broadly conserved, while constrained noncoding sequence (orange circles) shows a strong negative correlation between αselIndel and divergence, indicating rapid turnover.

Mentions: We developed three improvements for estimating αselIndel. First, we identified two issues in the original derivation of the NIM1 model, and found that corrections result in equal but opposite changes in the inferred αselIndel, so that these issues do not invalidate the original results (Text S1). To provide further assurance of the accuracy of the derivation we introduced a new likelihood neutral indel model (NIM2) that provides a partially independent validation of the revised NIM1 estimates (Text S2). Second, we find that earlier αselIndel estimates were upwardly biased as a consequence of poor quality alignments (Materials and Methods; Text S3; Figure S1; Figure S2). Third, we significantly extended the original simulation study, testing the influence of a wide range of modelling assumptions on the inferences. Results underscored the validity, accuracy and robustness of the model (Text S4; Text S5; Figure 1A; Figure S3).


8.2% of the Human genome is constrained: variation in rates of turnover across functional element classes in the human lineage.

Rands CM, Meader S, Ponting CP, Lunter G - PLoS Genet. (2014)

Evolutionary turnover of constrained sequence.A. Quantity of constrained sequence (αselIndel) estimated by NIM1 (blue bars) and NIM2 (red bars) plotted against ancestral repeat divergence for different pairs of eutherian species genomes, with the simulated data (grey) shown under a non-turnover scenario. B. Coding sequence (blue squares) is seen to be broadly conserved, while constrained noncoding sequence (orange circles) shows a strong negative correlation between αselIndel and divergence, indicating rapid turnover.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4109858&req=5

pgen-1004525-g001: Evolutionary turnover of constrained sequence.A. Quantity of constrained sequence (αselIndel) estimated by NIM1 (blue bars) and NIM2 (red bars) plotted against ancestral repeat divergence for different pairs of eutherian species genomes, with the simulated data (grey) shown under a non-turnover scenario. B. Coding sequence (blue squares) is seen to be broadly conserved, while constrained noncoding sequence (orange circles) shows a strong negative correlation between αselIndel and divergence, indicating rapid turnover.
Mentions: We developed three improvements for estimating αselIndel. First, we identified two issues in the original derivation of the NIM1 model, and found that corrections result in equal but opposite changes in the inferred αselIndel, so that these issues do not invalidate the original results (Text S1). To provide further assurance of the accuracy of the derivation we introduced a new likelihood neutral indel model (NIM2) that provides a partially independent validation of the revised NIM1 estimates (Text S2). Second, we find that earlier αselIndel estimates were upwardly biased as a consequence of poor quality alignments (Materials and Methods; Text S3; Figure S1; Figure S2). Third, we significantly extended the original simulation study, testing the influence of a wide range of modelling assumptions on the inferences. Results underscored the validity, accuracy and robustness of the model (Text S4; Text S5; Figure 1A; Figure S3).

Bottom Line: While enriched with ENCODE biochemical annotations, much of the short-lived constrained sequences we identify are not detected by models optimized for wider pan-mammalian conservation.By contrast, protein coding sequence has been highly stable, with an estimated half-life of over a billion years (d1/2 = 2.1-5.0).These results reveal that the evolutionary history of the human genome has been highly dynamic, particularly for its noncoding yet biologically functional fraction.

View Article: PubMed Central - PubMed

Affiliation: MRC Functional Genomics Unit, Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford, United Kingdom.

ABSTRACT
Ten years on from the finishing of the human reference genome sequence, it remains unclear what fraction of the human genome confers function, where this sequence resides, and how much is shared with other mammalian species. When addressing these questions, functional sequence has often been equated with pan-mammalian conserved sequence. However, functional elements that are short-lived, including those contributing to species-specific biology, will not leave a footprint of long-lasting negative selection. Here, we address these issues by identifying and characterising sequence that has been constrained with respect to insertions and deletions for pairs of eutherian genomes over a range of divergences. Within noncoding sequence, we find increasing amounts of mutually constrained sequence as species pairs become more closely related, indicating that noncoding constrained sequence turns over rapidly. We estimate that half of present-day noncoding constrained sequence has been gained or lost in approximately the last 130 million years (half-life in units of divergence time, d1/2 = 0.25-0.31). While enriched with ENCODE biochemical annotations, much of the short-lived constrained sequences we identify are not detected by models optimized for wider pan-mammalian conservation. Constrained DNase 1 hypersensitivity sites, promoters and untranslated regions have been more evolutionarily stable than long noncoding RNA loci which have turned over especially rapidly. By contrast, protein coding sequence has been highly stable, with an estimated half-life of over a billion years (d1/2 = 2.1-5.0). From extrapolations we estimate that 8.2% (7.1-9.2%) of the human genome is presently subject to negative selection and thus is likely to be functional, while only 2.2% has maintained constraint in both human and mouse since these species diverged. These results reveal that the evolutionary history of the human genome has been highly dynamic, particularly for its noncoding yet biologically functional fraction.

Show MeSH
Related in: MedlinePlus