Limits...
8.2% of the Human genome is constrained: variation in rates of turnover across functional element classes in the human lineage.

Rands CM, Meader S, Ponting CP, Lunter G - PLoS Genet. (2014)

Bottom Line: While enriched with ENCODE biochemical annotations, much of the short-lived constrained sequences we identify are not detected by models optimized for wider pan-mammalian conservation.By contrast, protein coding sequence has been highly stable, with an estimated half-life of over a billion years (d1/2 = 2.1-5.0).These results reveal that the evolutionary history of the human genome has been highly dynamic, particularly for its noncoding yet biologically functional fraction.

View Article: PubMed Central - PubMed

Affiliation: MRC Functional Genomics Unit, Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford, United Kingdom.

ABSTRACT
Ten years on from the finishing of the human reference genome sequence, it remains unclear what fraction of the human genome confers function, where this sequence resides, and how much is shared with other mammalian species. When addressing these questions, functional sequence has often been equated with pan-mammalian conserved sequence. However, functional elements that are short-lived, including those contributing to species-specific biology, will not leave a footprint of long-lasting negative selection. Here, we address these issues by identifying and characterising sequence that has been constrained with respect to insertions and deletions for pairs of eutherian genomes over a range of divergences. Within noncoding sequence, we find increasing amounts of mutually constrained sequence as species pairs become more closely related, indicating that noncoding constrained sequence turns over rapidly. We estimate that half of present-day noncoding constrained sequence has been gained or lost in approximately the last 130 million years (half-life in units of divergence time, d1/2 = 0.25-0.31). While enriched with ENCODE biochemical annotations, much of the short-lived constrained sequences we identify are not detected by models optimized for wider pan-mammalian conservation. Constrained DNase 1 hypersensitivity sites, promoters and untranslated regions have been more evolutionarily stable than long noncoding RNA loci which have turned over especially rapidly. By contrast, protein coding sequence has been highly stable, with an estimated half-life of over a billion years (d1/2 = 2.1-5.0). From extrapolations we estimate that 8.2% (7.1-9.2%) of the human genome is presently subject to negative selection and thus is likely to be functional, while only 2.2% has maintained constraint in both human and mouse since these species diverged. These results reveal that the evolutionary history of the human genome has been highly dynamic, particularly for its noncoding yet biologically functional fraction.

Show MeSH

Related in: MedlinePlus

Model-based inference of turnover by functional class.Schematic summary of the fraction of constrained sequence that has been retained (saturated colours) or turned over (pastel colours) in the human lineage over time (X-axis, divergence time) and how it has been distributed across various categories of functional element. In addition to showing the reduced quantity of preserved constrained sequence with increasing divergence, we infer the reciprocal quantity of sequence that is assumed to have been gained over human lineage evolution. For consistency this approach requires mutually exclusive annotation sets, in contrast to those used in Figure 3, making the results not directly comparable. Overlaps between the major different annotations are shown in Figure S10.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4109858&req=5

pgen-1004525-g004: Model-based inference of turnover by functional class.Schematic summary of the fraction of constrained sequence that has been retained (saturated colours) or turned over (pastel colours) in the human lineage over time (X-axis, divergence time) and how it has been distributed across various categories of functional element. In addition to showing the reduced quantity of preserved constrained sequence with increasing divergence, we infer the reciprocal quantity of sequence that is assumed to have been gained over human lineage evolution. For consistency this approach requires mutually exclusive annotation sets, in contrast to those used in Figure 3, making the results not directly comparable. Overlaps between the major different annotations are shown in Figure S10.

Mentions: We next examined how constrained sequence in the human genome is distributed cumulatively for selected functional element categories. We do this by fitting the functional turnover model to the observed data and extrapolating to the present day. In this way we also infer the reciprocal quantities of sequence that, when comparing to another species or human ancestor at a particular divergence, are presently functional in human yet have lost (or not gained) constraint in the lineage leading to the ancestor or other species (Figure 4). We stress that this inference relies on the parsimonious yet not formally justified assumption that the total quantity of functional sequence in genomes remains constant over time and therefore across species, and within functional categories. With these caveats we estimate that 8.6 Mb (26%) of constrained coding sequence has lost constraint (and thus has turned over) since the divergence of humans from monotremes approximately 228 million year ago (AR divergence time 1.00), while 200 Mb (79%) of the constrained noncoding human genome is inferred to have lost constraint over the same period. DNAse HSs cover more indel constrained sequence at all divergence ranges than all other annotated noncoding sequence combined, implying that DNAse HSs are an abundant and informative biochemical marker of functionality outside protein coding regions. Enhancers also show a marked contribution towards the constrained human genome, while TFBSs, promoters, UTRs and lncRNAs contribute considerably less sequence once their overlap with other annotations is removed. Finally, about a quarter of sequence inferred to be presently under constraint is not present in any of the annotation categories we considered. In Figure 4 we sum up the quantities of constrained sequence estimated from independent NIM1 runs for different annotation types.


8.2% of the Human genome is constrained: variation in rates of turnover across functional element classes in the human lineage.

Rands CM, Meader S, Ponting CP, Lunter G - PLoS Genet. (2014)

Model-based inference of turnover by functional class.Schematic summary of the fraction of constrained sequence that has been retained (saturated colours) or turned over (pastel colours) in the human lineage over time (X-axis, divergence time) and how it has been distributed across various categories of functional element. In addition to showing the reduced quantity of preserved constrained sequence with increasing divergence, we infer the reciprocal quantity of sequence that is assumed to have been gained over human lineage evolution. For consistency this approach requires mutually exclusive annotation sets, in contrast to those used in Figure 3, making the results not directly comparable. Overlaps between the major different annotations are shown in Figure S10.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4109858&req=5

pgen-1004525-g004: Model-based inference of turnover by functional class.Schematic summary of the fraction of constrained sequence that has been retained (saturated colours) or turned over (pastel colours) in the human lineage over time (X-axis, divergence time) and how it has been distributed across various categories of functional element. In addition to showing the reduced quantity of preserved constrained sequence with increasing divergence, we infer the reciprocal quantity of sequence that is assumed to have been gained over human lineage evolution. For consistency this approach requires mutually exclusive annotation sets, in contrast to those used in Figure 3, making the results not directly comparable. Overlaps between the major different annotations are shown in Figure S10.
Mentions: We next examined how constrained sequence in the human genome is distributed cumulatively for selected functional element categories. We do this by fitting the functional turnover model to the observed data and extrapolating to the present day. In this way we also infer the reciprocal quantities of sequence that, when comparing to another species or human ancestor at a particular divergence, are presently functional in human yet have lost (or not gained) constraint in the lineage leading to the ancestor or other species (Figure 4). We stress that this inference relies on the parsimonious yet not formally justified assumption that the total quantity of functional sequence in genomes remains constant over time and therefore across species, and within functional categories. With these caveats we estimate that 8.6 Mb (26%) of constrained coding sequence has lost constraint (and thus has turned over) since the divergence of humans from monotremes approximately 228 million year ago (AR divergence time 1.00), while 200 Mb (79%) of the constrained noncoding human genome is inferred to have lost constraint over the same period. DNAse HSs cover more indel constrained sequence at all divergence ranges than all other annotated noncoding sequence combined, implying that DNAse HSs are an abundant and informative biochemical marker of functionality outside protein coding regions. Enhancers also show a marked contribution towards the constrained human genome, while TFBSs, promoters, UTRs and lncRNAs contribute considerably less sequence once their overlap with other annotations is removed. Finally, about a quarter of sequence inferred to be presently under constraint is not present in any of the annotation categories we considered. In Figure 4 we sum up the quantities of constrained sequence estimated from independent NIM1 runs for different annotation types.

Bottom Line: While enriched with ENCODE biochemical annotations, much of the short-lived constrained sequences we identify are not detected by models optimized for wider pan-mammalian conservation.By contrast, protein coding sequence has been highly stable, with an estimated half-life of over a billion years (d1/2 = 2.1-5.0).These results reveal that the evolutionary history of the human genome has been highly dynamic, particularly for its noncoding yet biologically functional fraction.

View Article: PubMed Central - PubMed

Affiliation: MRC Functional Genomics Unit, Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford, United Kingdom.

ABSTRACT
Ten years on from the finishing of the human reference genome sequence, it remains unclear what fraction of the human genome confers function, where this sequence resides, and how much is shared with other mammalian species. When addressing these questions, functional sequence has often been equated with pan-mammalian conserved sequence. However, functional elements that are short-lived, including those contributing to species-specific biology, will not leave a footprint of long-lasting negative selection. Here, we address these issues by identifying and characterising sequence that has been constrained with respect to insertions and deletions for pairs of eutherian genomes over a range of divergences. Within noncoding sequence, we find increasing amounts of mutually constrained sequence as species pairs become more closely related, indicating that noncoding constrained sequence turns over rapidly. We estimate that half of present-day noncoding constrained sequence has been gained or lost in approximately the last 130 million years (half-life in units of divergence time, d1/2 = 0.25-0.31). While enriched with ENCODE biochemical annotations, much of the short-lived constrained sequences we identify are not detected by models optimized for wider pan-mammalian conservation. Constrained DNase 1 hypersensitivity sites, promoters and untranslated regions have been more evolutionarily stable than long noncoding RNA loci which have turned over especially rapidly. By contrast, protein coding sequence has been highly stable, with an estimated half-life of over a billion years (d1/2 = 2.1-5.0). From extrapolations we estimate that 8.2% (7.1-9.2%) of the human genome is presently subject to negative selection and thus is likely to be functional, while only 2.2% has maintained constraint in both human and mouse since these species diverged. These results reveal that the evolutionary history of the human genome has been highly dynamic, particularly for its noncoding yet biologically functional fraction.

Show MeSH
Related in: MedlinePlus