Limits...
Proto-genes and de novo gene birth.

Carvunis AR, Rolland T, Wapinski I, Calderwood MA, Yildirim MA, Simonis N, Charloteaux B, Hidalgo CA, Barbette J, Santhanam B, Brar GA, Weissman JS, Regev A, Thierry-Mieg N, Cusick ME, Vidal M - Nature (2012)

Bottom Line: In contrast, de novo gene birth remains poorly understood, mainly because translation of sequences devoid of genes, or 'non-genic' sequences, is expected to produce insignificant polypeptides rather than proteins with specific biological functions.These translation events seem to provide adaptive potential, as suggested by their differential regulation upon stress and by signatures of retention by natural selection.Our work illustrates that evolution exploits seemingly dispensable sequences to generate adaptive functional innovation.

View Article: PubMed Central - PubMed

Affiliation: Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA.

ABSTRACT
Novel protein-coding genes can arise either through re-organization of pre-existing genes or de novo. Processes involving re-organization of pre-existing genes, notably after gene duplication, have been extensively described. In contrast, de novo gene birth remains poorly understood, mainly because translation of sequences devoid of genes, or 'non-genic' sequences, is expected to produce insignificant polypeptides rather than proteins with specific biological functions. Here we formalize an evolutionary model according to which functional genes evolve de novo through transitory proto-genes generated by widespread translational activity in non-genic sequences. Testing this model at the genome scale in Saccharomyces cerevisiae, we detect translation of hundreds of short species-specific open reading frames (ORFs) located in non-genic sequences. These translation events seem to provide adaptive potential, as suggested by their differential regulation upon stress and by signatures of retention by natural selection. In line with our model, we establish that S. cerevisiae ORFs can be placed within an evolutionary continuum ranging from non-genic sequences to genes. We identify ~1,900 candidate proto-genes among S. cerevisiae ORFs and find that de novo gene birth from such a reservoir may be more prevalent than sporadic gene duplication. Our work illustrates that evolution exploits seemingly dispensable sequences to generate adaptive functional innovation.

Show MeSH

Related in: MedlinePlus

Existence of an evolutionary continuum ranging from non-genic ORFs to genes through proto-genesa, Length (top; error bars represent s.e.m.), RNA expression level (middle; error bars represent s.e.m.), and proximity to transcription factor binding sites (bottom; error bars represent standard error of the proportion) of ORFs correlate with conservation level. P and tau: Kendall’s correlation statistics. Estimation of RNA abundance from RNAseq25 in rich conditions. The positive correlation between proximity to transcription factor binding sites and conservation level is shown for a window of 200 nucleotides and holds when considering windows of 300, 400 and 500 nucleotides (Kendall’s tau = 0.14, 0.16, 0.17, respectively; P < 2.2 × 10−16 in each case). b, Codon bias increases with conservation level. Codon bias estimated using the codon adaptation index (Supplementary Information). P and tau: Kendall’s correlation statistics. Error bars represent s.e.m. The large s.e.m. observed for ORFs5 may be related to the whole genome duplication event (Supplementary Fig. 3). c,Relative amino acid abundances shift with increasing conservation level. For each encoded amino acid, the ratio between its frequency in ORFs1-4 and its frequency in ORFs5-10 (gray), or the ratio between its frequency in ORFs1-4 and its frequency in ORFs0 (black), is plotted. Enrichment of cysteine in proteins encoded by ORFs1-4 relative to those encoded by ORFs5-10 (P < 1.8 × 10−150, hypergeometric test) corresponds to 3.6 ± 0.1 residues (mean, s.e.m.) per translation product. d, Predicted structural features of ORF translation products correlate with conservation level. ORFs0 were not included in these analyses as their short length hinders the reliability of structural predictions. Error bars represent s.e.m.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3401362&req=5

Figure 2: Existence of an evolutionary continuum ranging from non-genic ORFs to genes through proto-genesa, Length (top; error bars represent s.e.m.), RNA expression level (middle; error bars represent s.e.m.), and proximity to transcription factor binding sites (bottom; error bars represent standard error of the proportion) of ORFs correlate with conservation level. P and tau: Kendall’s correlation statistics. Estimation of RNA abundance from RNAseq25 in rich conditions. The positive correlation between proximity to transcription factor binding sites and conservation level is shown for a window of 200 nucleotides and holds when considering windows of 300, 400 and 500 nucleotides (Kendall’s tau = 0.14, 0.16, 0.17, respectively; P < 2.2 × 10−16 in each case). b, Codon bias increases with conservation level. Codon bias estimated using the codon adaptation index (Supplementary Information). P and tau: Kendall’s correlation statistics. Error bars represent s.e.m. The large s.e.m. observed for ORFs5 may be related to the whole genome duplication event (Supplementary Fig. 3). c,Relative amino acid abundances shift with increasing conservation level. For each encoded amino acid, the ratio between its frequency in ORFs1-4 and its frequency in ORFs5-10 (gray), or the ratio between its frequency in ORFs1-4 and its frequency in ORFs0 (black), is plotted. Enrichment of cysteine in proteins encoded by ORFs1-4 relative to those encoded by ORFs5-10 (P < 1.8 × 10−150, hypergeometric test) corresponds to 3.6 ± 0.1 residues (mean, s.e.m.) per translation product. d, Predicted structural features of ORF translation products correlate with conservation level. ORFs0 were not included in these analyses as their short length hinders the reliability of structural predictions. Error bars represent s.e.m.

Mentions: To test the evolutionary continuum prediction, we first verified that ORF conservation level correlates positively with length and expression level (Fig. 2a and Supplementary Fig. 5)1,10-12. These correlations suggest that genes evolve from non-genic ORFs that lengthen and increase in expression level over evolutionary time. A negative correlation between ORF length and expression level21 was observed among ORFs5-10, but not among ORFs1-4 (Supplementary Fig. 5). Thus, some ORFs may increase in expression level at different rates than they increase in length over evolutionary time. Lengthening of ORFs could occur by loss of stop codons, possibly following translational read-through, by shift of start codons or by duplication followed by fusion with other ORFs10,22. Increase in ORF expression level could be mediated by recruitment of existing regulatory elements1. The proportion of ORFs located in the vicinity of transcription factor binding sites increases with conservation level, suggesting that novel regulatory elements could also emerge (Fig. 2a)1.


Proto-genes and de novo gene birth.

Carvunis AR, Rolland T, Wapinski I, Calderwood MA, Yildirim MA, Simonis N, Charloteaux B, Hidalgo CA, Barbette J, Santhanam B, Brar GA, Weissman JS, Regev A, Thierry-Mieg N, Cusick ME, Vidal M - Nature (2012)

Existence of an evolutionary continuum ranging from non-genic ORFs to genes through proto-genesa, Length (top; error bars represent s.e.m.), RNA expression level (middle; error bars represent s.e.m.), and proximity to transcription factor binding sites (bottom; error bars represent standard error of the proportion) of ORFs correlate with conservation level. P and tau: Kendall’s correlation statistics. Estimation of RNA abundance from RNAseq25 in rich conditions. The positive correlation between proximity to transcription factor binding sites and conservation level is shown for a window of 200 nucleotides and holds when considering windows of 300, 400 and 500 nucleotides (Kendall’s tau = 0.14, 0.16, 0.17, respectively; P < 2.2 × 10−16 in each case). b, Codon bias increases with conservation level. Codon bias estimated using the codon adaptation index (Supplementary Information). P and tau: Kendall’s correlation statistics. Error bars represent s.e.m. The large s.e.m. observed for ORFs5 may be related to the whole genome duplication event (Supplementary Fig. 3). c,Relative amino acid abundances shift with increasing conservation level. For each encoded amino acid, the ratio between its frequency in ORFs1-4 and its frequency in ORFs5-10 (gray), or the ratio between its frequency in ORFs1-4 and its frequency in ORFs0 (black), is plotted. Enrichment of cysteine in proteins encoded by ORFs1-4 relative to those encoded by ORFs5-10 (P < 1.8 × 10−150, hypergeometric test) corresponds to 3.6 ± 0.1 residues (mean, s.e.m.) per translation product. d, Predicted structural features of ORF translation products correlate with conservation level. ORFs0 were not included in these analyses as their short length hinders the reliability of structural predictions. Error bars represent s.e.m.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3401362&req=5

Figure 2: Existence of an evolutionary continuum ranging from non-genic ORFs to genes through proto-genesa, Length (top; error bars represent s.e.m.), RNA expression level (middle; error bars represent s.e.m.), and proximity to transcription factor binding sites (bottom; error bars represent standard error of the proportion) of ORFs correlate with conservation level. P and tau: Kendall’s correlation statistics. Estimation of RNA abundance from RNAseq25 in rich conditions. The positive correlation between proximity to transcription factor binding sites and conservation level is shown for a window of 200 nucleotides and holds when considering windows of 300, 400 and 500 nucleotides (Kendall’s tau = 0.14, 0.16, 0.17, respectively; P < 2.2 × 10−16 in each case). b, Codon bias increases with conservation level. Codon bias estimated using the codon adaptation index (Supplementary Information). P and tau: Kendall’s correlation statistics. Error bars represent s.e.m. The large s.e.m. observed for ORFs5 may be related to the whole genome duplication event (Supplementary Fig. 3). c,Relative amino acid abundances shift with increasing conservation level. For each encoded amino acid, the ratio between its frequency in ORFs1-4 and its frequency in ORFs5-10 (gray), or the ratio between its frequency in ORFs1-4 and its frequency in ORFs0 (black), is plotted. Enrichment of cysteine in proteins encoded by ORFs1-4 relative to those encoded by ORFs5-10 (P < 1.8 × 10−150, hypergeometric test) corresponds to 3.6 ± 0.1 residues (mean, s.e.m.) per translation product. d, Predicted structural features of ORF translation products correlate with conservation level. ORFs0 were not included in these analyses as their short length hinders the reliability of structural predictions. Error bars represent s.e.m.
Mentions: To test the evolutionary continuum prediction, we first verified that ORF conservation level correlates positively with length and expression level (Fig. 2a and Supplementary Fig. 5)1,10-12. These correlations suggest that genes evolve from non-genic ORFs that lengthen and increase in expression level over evolutionary time. A negative correlation between ORF length and expression level21 was observed among ORFs5-10, but not among ORFs1-4 (Supplementary Fig. 5). Thus, some ORFs may increase in expression level at different rates than they increase in length over evolutionary time. Lengthening of ORFs could occur by loss of stop codons, possibly following translational read-through, by shift of start codons or by duplication followed by fusion with other ORFs10,22. Increase in ORF expression level could be mediated by recruitment of existing regulatory elements1. The proportion of ORFs located in the vicinity of transcription factor binding sites increases with conservation level, suggesting that novel regulatory elements could also emerge (Fig. 2a)1.

Bottom Line: In contrast, de novo gene birth remains poorly understood, mainly because translation of sequences devoid of genes, or 'non-genic' sequences, is expected to produce insignificant polypeptides rather than proteins with specific biological functions.These translation events seem to provide adaptive potential, as suggested by their differential regulation upon stress and by signatures of retention by natural selection.Our work illustrates that evolution exploits seemingly dispensable sequences to generate adaptive functional innovation.

View Article: PubMed Central - PubMed

Affiliation: Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA.

ABSTRACT
Novel protein-coding genes can arise either through re-organization of pre-existing genes or de novo. Processes involving re-organization of pre-existing genes, notably after gene duplication, have been extensively described. In contrast, de novo gene birth remains poorly understood, mainly because translation of sequences devoid of genes, or 'non-genic' sequences, is expected to produce insignificant polypeptides rather than proteins with specific biological functions. Here we formalize an evolutionary model according to which functional genes evolve de novo through transitory proto-genes generated by widespread translational activity in non-genic sequences. Testing this model at the genome scale in Saccharomyces cerevisiae, we detect translation of hundreds of short species-specific open reading frames (ORFs) located in non-genic sequences. These translation events seem to provide adaptive potential, as suggested by their differential regulation upon stress and by signatures of retention by natural selection. In line with our model, we establish that S. cerevisiae ORFs can be placed within an evolutionary continuum ranging from non-genic sequences to genes. We identify ~1,900 candidate proto-genes among S. cerevisiae ORFs and find that de novo gene birth from such a reservoir may be more prevalent than sporadic gene duplication. Our work illustrates that evolution exploits seemingly dispensable sequences to generate adaptive functional innovation.

Show MeSH
Related in: MedlinePlus