Limits...
Origins of De Novo Genes in Human and Chimpanzee.

Ruiz-Orera J, Hernandez-Rodriguez J, Chiva C, Sabidó E, Kondova I, Bontrop R, Marqués-Bonet T, Albà MM - PLoS Genet. (2015)

Bottom Line: Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies.This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species.Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS.

View Article: PubMed Central - PubMed

Affiliation: Evolutionary Genomics Group, Hospital del Mar Research Institute (IMIM), Barcelona, Spain.

ABSTRACT
The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species--human, chimpanzee, macaque, and mouse--and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins.

Show MeSH

Related in: MedlinePlus

Identification and characterization of de novo genes in human and chimpanzee.a) Simplified phylogenetic tree indicating the nine species considered in this study. In all species we had RNA-Seq data from several tissues. Chimpanzee, human, macaque and mouse were the species for which we performed strand-specific deep polyA+ RNA sequencing. We indicate the branches in which de novo genes were defined, together with the number of genes. b) Categories of transcripts in de novo genes based on genomic location. Intergenic, transcripts that do not overlap any other gene; Overlapping antisense, transcripts that overlap exons from other genes in the opposite strand; Overlapping intronic, transcripts that overlap introns from other genes in the opposite strand, with no exonic overlap. c) Classification of de novo genes based on existing evidence in databases. Annotated; genes classified as annotated in Ensembl v.75; EST/nr; non-annotated genes with BLAST hits (10−4) to expressed sequence tags (EST) and/or non-redundant protein (nr) sequences in the same species. Novel; rest of genes. d) Patterns of gene expression in four tissues. Brain refers to frontal cortex. Transcripts with FPKM > 0 in a tissue are considered as expressed in that tissue. In red boxes, fraction of transcripts whose expression is restricted to that tissue (τ > 0.85, see Methods). Chimp conserved, transcripts assembled in chimpanzee not classified as de novo. Human conserved, transcripts assembled in human not classified as de novo. e) Number of testis GTEx samples with expression of de novo and conserved genes. We considered all annotated genes with FPKM > 0 in at least one testis sample. Conserved, genes sampled from the total pool of annotated genes analyzed in GTEx with the same distribution of FPKM values than in annotated de novo genes (n = 200).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4697840&req=5

pgen.1005721.g002: Identification and characterization of de novo genes in human and chimpanzee.a) Simplified phylogenetic tree indicating the nine species considered in this study. In all species we had RNA-Seq data from several tissues. Chimpanzee, human, macaque and mouse were the species for which we performed strand-specific deep polyA+ RNA sequencing. We indicate the branches in which de novo genes were defined, together with the number of genes. b) Categories of transcripts in de novo genes based on genomic location. Intergenic, transcripts that do not overlap any other gene; Overlapping antisense, transcripts that overlap exons from other genes in the opposite strand; Overlapping intronic, transcripts that overlap introns from other genes in the opposite strand, with no exonic overlap. c) Classification of de novo genes based on existing evidence in databases. Annotated; genes classified as annotated in Ensembl v.75; EST/nr; non-annotated genes with BLAST hits (10−4) to expressed sequence tags (EST) and/or non-redundant protein (nr) sequences in the same species. Novel; rest of genes. d) Patterns of gene expression in four tissues. Brain refers to frontal cortex. Transcripts with FPKM > 0 in a tissue are considered as expressed in that tissue. In red boxes, fraction of transcripts whose expression is restricted to that tissue (τ > 0.85, see Methods). Chimp conserved, transcripts assembled in chimpanzee not classified as de novo. Human conserved, transcripts assembled in human not classified as de novo. e) Number of testis GTEx samples with expression of de novo and conserved genes. We considered all annotated genes with FPKM > 0 in at least one testis sample. Conserved, genes sampled from the total pool of annotated genes analyzed in GTEx with the same distribution of FPKM values than in annotated de novo genes (n = 200).

Mentions: Next, we used BLAST-based sequence similarity searches [49] to identify the subset of de novo genes that could have originated in human, chimpanzee, or the common ancestor of these two species since the divergence from macaque (hominoid-specific genes). These genes lacked homologues in other species after exhaustive searches against the transcript assemblies described above, the transcript assemblies obtained using previously published non-stranded single read RNA-Seq data for nine vertebrate species [50], Ensembl gene annotations for the same set of species, and the complete expressed sequence tag (EST) and non-redundant (nr) protein databases from NCBI. We also employed genomic alignments to discard any transcripts expressed in syntenic regions in other species that could have been missed by BLAST (S2 Fig). This pipeline identified 634 human-specific genes (1,029 transcripts), 780 chimpanzee-specific genes (1,307 transcripts), and 1,300 hominoid-specific genes (3,062 transcripts). Taken together, the total number of candidate de novo genes was 2,714 (5,398 transcripts) (Fig 2a). The rest of genes will be referred to as conserved genes.


Origins of De Novo Genes in Human and Chimpanzee.

Ruiz-Orera J, Hernandez-Rodriguez J, Chiva C, Sabidó E, Kondova I, Bontrop R, Marqués-Bonet T, Albà MM - PLoS Genet. (2015)

Identification and characterization of de novo genes in human and chimpanzee.a) Simplified phylogenetic tree indicating the nine species considered in this study. In all species we had RNA-Seq data from several tissues. Chimpanzee, human, macaque and mouse were the species for which we performed strand-specific deep polyA+ RNA sequencing. We indicate the branches in which de novo genes were defined, together with the number of genes. b) Categories of transcripts in de novo genes based on genomic location. Intergenic, transcripts that do not overlap any other gene; Overlapping antisense, transcripts that overlap exons from other genes in the opposite strand; Overlapping intronic, transcripts that overlap introns from other genes in the opposite strand, with no exonic overlap. c) Classification of de novo genes based on existing evidence in databases. Annotated; genes classified as annotated in Ensembl v.75; EST/nr; non-annotated genes with BLAST hits (10−4) to expressed sequence tags (EST) and/or non-redundant protein (nr) sequences in the same species. Novel; rest of genes. d) Patterns of gene expression in four tissues. Brain refers to frontal cortex. Transcripts with FPKM > 0 in a tissue are considered as expressed in that tissue. In red boxes, fraction of transcripts whose expression is restricted to that tissue (τ > 0.85, see Methods). Chimp conserved, transcripts assembled in chimpanzee not classified as de novo. Human conserved, transcripts assembled in human not classified as de novo. e) Number of testis GTEx samples with expression of de novo and conserved genes. We considered all annotated genes with FPKM > 0 in at least one testis sample. Conserved, genes sampled from the total pool of annotated genes analyzed in GTEx with the same distribution of FPKM values than in annotated de novo genes (n = 200).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4697840&req=5

pgen.1005721.g002: Identification and characterization of de novo genes in human and chimpanzee.a) Simplified phylogenetic tree indicating the nine species considered in this study. In all species we had RNA-Seq data from several tissues. Chimpanzee, human, macaque and mouse were the species for which we performed strand-specific deep polyA+ RNA sequencing. We indicate the branches in which de novo genes were defined, together with the number of genes. b) Categories of transcripts in de novo genes based on genomic location. Intergenic, transcripts that do not overlap any other gene; Overlapping antisense, transcripts that overlap exons from other genes in the opposite strand; Overlapping intronic, transcripts that overlap introns from other genes in the opposite strand, with no exonic overlap. c) Classification of de novo genes based on existing evidence in databases. Annotated; genes classified as annotated in Ensembl v.75; EST/nr; non-annotated genes with BLAST hits (10−4) to expressed sequence tags (EST) and/or non-redundant protein (nr) sequences in the same species. Novel; rest of genes. d) Patterns of gene expression in four tissues. Brain refers to frontal cortex. Transcripts with FPKM > 0 in a tissue are considered as expressed in that tissue. In red boxes, fraction of transcripts whose expression is restricted to that tissue (τ > 0.85, see Methods). Chimp conserved, transcripts assembled in chimpanzee not classified as de novo. Human conserved, transcripts assembled in human not classified as de novo. e) Number of testis GTEx samples with expression of de novo and conserved genes. We considered all annotated genes with FPKM > 0 in at least one testis sample. Conserved, genes sampled from the total pool of annotated genes analyzed in GTEx with the same distribution of FPKM values than in annotated de novo genes (n = 200).
Mentions: Next, we used BLAST-based sequence similarity searches [49] to identify the subset of de novo genes that could have originated in human, chimpanzee, or the common ancestor of these two species since the divergence from macaque (hominoid-specific genes). These genes lacked homologues in other species after exhaustive searches against the transcript assemblies described above, the transcript assemblies obtained using previously published non-stranded single read RNA-Seq data for nine vertebrate species [50], Ensembl gene annotations for the same set of species, and the complete expressed sequence tag (EST) and non-redundant (nr) protein databases from NCBI. We also employed genomic alignments to discard any transcripts expressed in syntenic regions in other species that could have been missed by BLAST (S2 Fig). This pipeline identified 634 human-specific genes (1,029 transcripts), 780 chimpanzee-specific genes (1,307 transcripts), and 1,300 hominoid-specific genes (3,062 transcripts). Taken together, the total number of candidate de novo genes was 2,714 (5,398 transcripts) (Fig 2a). The rest of genes will be referred to as conserved genes.

Bottom Line: Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies.This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species.Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS.

View Article: PubMed Central - PubMed

Affiliation: Evolutionary Genomics Group, Hospital del Mar Research Institute (IMIM), Barcelona, Spain.

ABSTRACT
The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species--human, chimpanzee, macaque, and mouse--and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins.

Show MeSH
Related in: MedlinePlus