Limits...
8.2% of the Human genome is constrained: variation in rates of turnover across functional element classes in the human lineage.

Rands CM, Meader S, Ponting CP, Lunter G - PLoS Genet. (2014)

Bottom Line: While enriched with ENCODE biochemical annotations, much of the short-lived constrained sequences we identify are not detected by models optimized for wider pan-mammalian conservation.By contrast, protein coding sequence has been highly stable, with an estimated half-life of over a billion years (d1/2 = 2.1-5.0).These results reveal that the evolutionary history of the human genome has been highly dynamic, particularly for its noncoding yet biologically functional fraction.

View Article: PubMed Central - PubMed

Affiliation: MRC Functional Genomics Unit, Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford, United Kingdom.

ABSTRACT
Ten years on from the finishing of the human reference genome sequence, it remains unclear what fraction of the human genome confers function, where this sequence resides, and how much is shared with other mammalian species. When addressing these questions, functional sequence has often been equated with pan-mammalian conserved sequence. However, functional elements that are short-lived, including those contributing to species-specific biology, will not leave a footprint of long-lasting negative selection. Here, we address these issues by identifying and characterising sequence that has been constrained with respect to insertions and deletions for pairs of eutherian genomes over a range of divergences. Within noncoding sequence, we find increasing amounts of mutually constrained sequence as species pairs become more closely related, indicating that noncoding constrained sequence turns over rapidly. We estimate that half of present-day noncoding constrained sequence has been gained or lost in approximately the last 130 million years (half-life in units of divergence time, d1/2 = 0.25-0.31). While enriched with ENCODE biochemical annotations, much of the short-lived constrained sequences we identify are not detected by models optimized for wider pan-mammalian conservation. Constrained DNase 1 hypersensitivity sites, promoters and untranslated regions have been more evolutionarily stable than long noncoding RNA loci which have turned over especially rapidly. By contrast, protein coding sequence has been highly stable, with an estimated half-life of over a billion years (d1/2 = 2.1-5.0). From extrapolations we estimate that 8.2% (7.1-9.2%) of the human genome is presently subject to negative selection and thus is likely to be functional, while only 2.2% has maintained constraint in both human and mouse since these species diverged. These results reveal that the evolutionary history of the human genome has been highly dynamic, particularly for its noncoding yet biologically functional fraction.

Show MeSH

Related in: MedlinePlus

The overlap of constrained sequence with pan-mammalian conserved sequences.The proportions A., and quantities B., of constrained sequence at the present for different types of biochemically annotated and un-annotated sequences, with and without PhastCons or GERP++ conserved elements, estimated using linear extrapolations (Text S6, Text S7). The NIM1 has power to detect functional lineage-specific constrained sequence: NIM1 detects significantly higher fractions of linage-specific constrained sequence (defined as sequence identified by NIM1 but not annotated by PhastCons or GERP++ as being conserved across mammals) within 3 mutually exclusive classes of ENCODE biochemical annotations compared to sequence lacking such annotation; see Text S6 for details.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4109858&req=5

pgen-1004525-g002: The overlap of constrained sequence with pan-mammalian conserved sequences.The proportions A., and quantities B., of constrained sequence at the present for different types of biochemically annotated and un-annotated sequences, with and without PhastCons or GERP++ conserved elements, estimated using linear extrapolations (Text S6, Text S7). The NIM1 has power to detect functional lineage-specific constrained sequence: NIM1 detects significantly higher fractions of linage-specific constrained sequence (defined as sequence identified by NIM1 but not annotated by PhastCons or GERP++ as being conserved across mammals) within 3 mutually exclusive classes of ENCODE biochemical annotations compared to sequence lacking such annotation; see Text S6 for details.

Mentions: To exclude the possibility that technical artefacts are driving this observation, we investigated ENCODE annotations in lineage-specific NIM1-constrained sequence. Specifically, we identified NIM1-constrained sequence that was not identified as pan-mammalian conserved by either the PhastCons [12] or GERP++ algorithms [19], and found that such sequence is enriched for biochemically annotated sequences (DNase HSs, TFBSs, and enhancers defined by the ENCODE consortium [5]) (Figure 2; Figure S4). This is expected if functional elements, including these ENCODE functional classes, have been subject to evolutionary turnover, but is not expected if technical artefacts were causing the observations in Figure 1. Furthermore, using low-frequency polymorphic indels from the 1000 Genomes project we could exclude the possibility that lower mutation rates in ENCODE functional regions were causing the observations. We therefore conclude that observations in Figure 1 reflect turnover of functional elements. A more detailed discussion on this issue is provided in Text S6 and Text S7.


8.2% of the Human genome is constrained: variation in rates of turnover across functional element classes in the human lineage.

Rands CM, Meader S, Ponting CP, Lunter G - PLoS Genet. (2014)

The overlap of constrained sequence with pan-mammalian conserved sequences.The proportions A., and quantities B., of constrained sequence at the present for different types of biochemically annotated and un-annotated sequences, with and without PhastCons or GERP++ conserved elements, estimated using linear extrapolations (Text S6, Text S7). The NIM1 has power to detect functional lineage-specific constrained sequence: NIM1 detects significantly higher fractions of linage-specific constrained sequence (defined as sequence identified by NIM1 but not annotated by PhastCons or GERP++ as being conserved across mammals) within 3 mutually exclusive classes of ENCODE biochemical annotations compared to sequence lacking such annotation; see Text S6 for details.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4109858&req=5

pgen-1004525-g002: The overlap of constrained sequence with pan-mammalian conserved sequences.The proportions A., and quantities B., of constrained sequence at the present for different types of biochemically annotated and un-annotated sequences, with and without PhastCons or GERP++ conserved elements, estimated using linear extrapolations (Text S6, Text S7). The NIM1 has power to detect functional lineage-specific constrained sequence: NIM1 detects significantly higher fractions of linage-specific constrained sequence (defined as sequence identified by NIM1 but not annotated by PhastCons or GERP++ as being conserved across mammals) within 3 mutually exclusive classes of ENCODE biochemical annotations compared to sequence lacking such annotation; see Text S6 for details.
Mentions: To exclude the possibility that technical artefacts are driving this observation, we investigated ENCODE annotations in lineage-specific NIM1-constrained sequence. Specifically, we identified NIM1-constrained sequence that was not identified as pan-mammalian conserved by either the PhastCons [12] or GERP++ algorithms [19], and found that such sequence is enriched for biochemically annotated sequences (DNase HSs, TFBSs, and enhancers defined by the ENCODE consortium [5]) (Figure 2; Figure S4). This is expected if functional elements, including these ENCODE functional classes, have been subject to evolutionary turnover, but is not expected if technical artefacts were causing the observations in Figure 1. Furthermore, using low-frequency polymorphic indels from the 1000 Genomes project we could exclude the possibility that lower mutation rates in ENCODE functional regions were causing the observations. We therefore conclude that observations in Figure 1 reflect turnover of functional elements. A more detailed discussion on this issue is provided in Text S6 and Text S7.

Bottom Line: While enriched with ENCODE biochemical annotations, much of the short-lived constrained sequences we identify are not detected by models optimized for wider pan-mammalian conservation.By contrast, protein coding sequence has been highly stable, with an estimated half-life of over a billion years (d1/2 = 2.1-5.0).These results reveal that the evolutionary history of the human genome has been highly dynamic, particularly for its noncoding yet biologically functional fraction.

View Article: PubMed Central - PubMed

Affiliation: MRC Functional Genomics Unit, Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford, United Kingdom.

ABSTRACT
Ten years on from the finishing of the human reference genome sequence, it remains unclear what fraction of the human genome confers function, where this sequence resides, and how much is shared with other mammalian species. When addressing these questions, functional sequence has often been equated with pan-mammalian conserved sequence. However, functional elements that are short-lived, including those contributing to species-specific biology, will not leave a footprint of long-lasting negative selection. Here, we address these issues by identifying and characterising sequence that has been constrained with respect to insertions and deletions for pairs of eutherian genomes over a range of divergences. Within noncoding sequence, we find increasing amounts of mutually constrained sequence as species pairs become more closely related, indicating that noncoding constrained sequence turns over rapidly. We estimate that half of present-day noncoding constrained sequence has been gained or lost in approximately the last 130 million years (half-life in units of divergence time, d1/2 = 0.25-0.31). While enriched with ENCODE biochemical annotations, much of the short-lived constrained sequences we identify are not detected by models optimized for wider pan-mammalian conservation. Constrained DNase 1 hypersensitivity sites, promoters and untranslated regions have been more evolutionarily stable than long noncoding RNA loci which have turned over especially rapidly. By contrast, protein coding sequence has been highly stable, with an estimated half-life of over a billion years (d1/2 = 2.1-5.0). From extrapolations we estimate that 8.2% (7.1-9.2%) of the human genome is presently subject to negative selection and thus is likely to be functional, while only 2.2% has maintained constraint in both human and mouse since these species diverged. These results reveal that the evolutionary history of the human genome has been highly dynamic, particularly for its noncoding yet biologically functional fraction.

Show MeSH
Related in: MedlinePlus