Limits...
Enhanced transcriptome maps from multiple mouse tissues reveal evolutionary constraint in gene expression.

Pervouchine DD, Djebali S, Breschi A, Davis CA, Barja PP, Dobin A, Tanzer A, Lagarde J, Zaleski C, See LH, Fastuca M, Drenkow J, Wang H, Bussotti G, Pei B, Balasubramanian S, Monlong J, Harmanci A, Gerstein M, Beer MA, Notredame C, Guigó R, Gingeras TR - Nat Commun (2015)

Bottom Line: This core set of genes captures a substantial fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types.Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer.Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but is associated with conserved epigenetic marking, as well as with characteristic post-transcriptional regulatory programme, in which sub-cellular localization and alternative splicing play comparatively large roles.

View Article: PubMed Central - PubMed

Affiliation: 1] Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, Barcelona 08003, Spain [2] Faculty of Bioengineering and Bioinformatics, Moscow State University, Leninskie Gory 1-73, 119992 Moscow, Russia.

ABSTRACT
Mice have been a long-standing model for human biology and disease. Here we characterize, by RNA sequencing, the transcriptional profiles of a large and heterogeneous collection of mouse tissues, augmenting the mouse transcriptome with thousands of novel transcript candidates. Comparison with transcriptome profiles in human cell lines reveals substantial conservation of transcriptional programmes, and uncovers a distinct class of genes with levels of expression that have been constrained early in vertebrate evolution. This core set of genes captures a substantial fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types. Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer. Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but is associated with conserved epigenetic marking, as well as with characteristic post-transcriptional regulatory programme, in which sub-cellular localization and alternative splicing play comparatively large roles.

No MeSH data available.


Related in: MedlinePlus

Genes with constrained expression.(a) The distribution of the dynamic range (DNR, log10 of the ratio of the largest and the lowest non-zero observation) of gene expression level in orthologous genes across human and mouse samples. (b) Venn diagram of the relationship between orthologous and constrained genes. (c) Proportion of nucleotides in expressed genes, as assessed by PolyA+ RNA-seq, that originates from constrained genes in human cell lines and mouse tissues. The labelled outliers correspond to mouse embryonic samples. (d,e) The distribution of DNR in human/mouse constrained and unconstrained genes in Merkin et al.10 (d) and Barbosa-Morais et al.9 (e). (f) The joint distribution of log10 average gene reads per kilobase per million mapped reads (RPKM) in pairs of orthologous protein-coding genes; constrained genes are shown in red. (g) The distribution of promoter, transcript and protein pairwise sequence identity between human and mouse in constrained and unconstrained genes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4308717&req=5

f4: Genes with constrained expression.(a) The distribution of the dynamic range (DNR, log10 of the ratio of the largest and the lowest non-zero observation) of gene expression level in orthologous genes across human and mouse samples. (b) Venn diagram of the relationship between orthologous and constrained genes. (c) Proportion of nucleotides in expressed genes, as assessed by PolyA+ RNA-seq, that originates from constrained genes in human cell lines and mouse tissues. The labelled outliers correspond to mouse embryonic samples. (d,e) The distribution of DNR in human/mouse constrained and unconstrained genes in Merkin et al.10 (d) and Barbosa-Morais et al.9 (e). (f) The joint distribution of log10 average gene reads per kilobase per million mapped reads (RPKM) in pairs of orthologous protein-coding genes; constrained genes are shown in red. (g) The distribution of promoter, transcript and protein pairwise sequence identity between human and mouse in constrained and unconstrained genes.

Mentions: As previously reported827, we found that within a given cell population the levels of gene expression may vary up to six orders of magnitude (Supplementary Fig. 8). In contrast, however, the expression of a given gene across cell types varies relatively little. Using a two-factor analysis of variance (ANOVA), with gene and cell type as two factors, we found that the variance across genes accounts for 76% and 71% of the total variance of gene expression across human and mouse cell types, respectively, whereas the fraction of the variance that can be directly attributed to cell type is less than 1% (Supplementary Methods). Indeed, we found a large fraction of genes, the expression of which varies relatively little across tissues and species. In Fig. 4a, we computed the distribution of the dynamic range of gene expression (DNR) in orthologous genes across the entire set of human and mouse samples. For each gene expressed in at least two samples both in human and mouse, we computed the log10 ratio between the highest and lowest measured expression. The distribution is bimodal, uncovering two broad gene classes. This is not an effect of comparing mouse tissues with human cell lines, as the same pattern is obtained when using RNA-seq obtained in human tissues by the Illumina Body Map project (Supplementary Fig. 9). We decompose the distribution assuming two underlying Gaussians, and we took DNR=2 at the approximate intersection point (Supplementary Fig. 10 and Supplementary Methods). In this way, we obtained a set of 6,636 genes (~40% of all 15,736 orthologous genes, Fig. 4b), the expression of which remains relatively constant (that is, within two orders of magnitude) across species and cell types. Genes with constrained expression show wider expression breadth (Supplementary Fig. 11A) and less tissue specificity than the rest of the orthologues (less than 10% of tissue-specific orthologues are included in this set, Supplementary Methods). However, they can be eventually detected as differentially expressed at a rate similar to the rest of the orthologues (82% versus 89%). They also show higher expression levels (Supplementary Fig. 11B). Therefore, although they represent only about 17% of all annotated genes, they capture a high proportion of the polyA+ transcription in the cell (39% on AVG in human and 41% in mouse), a proportion that remains remarkably constant across all tested human and mouse cell types (Fig. 4c). Mouse embryonic samples are an exception, with constrained genes generating only about 20% of the cell’s transcriptional output. We also found a negative association between minimal expression and DNR (Supplementary Fig. 12). Thus, genes with constrained expression tend to have also higher minimal expression, suggesting that these genes in their default state may already be primed for transcription. To eliminate gene expression as a potential confounding factor, most downstream analyses have been carried out in a subset of 5,519 genes with constrained expression, and in a subset of identical size from the rest of the orthologous genes for which we could compute DNR, with matched expression in human and mouse (the unconstrained set, Supplementary Fig. 11C and Supplementary Methods).


Enhanced transcriptome maps from multiple mouse tissues reveal evolutionary constraint in gene expression.

Pervouchine DD, Djebali S, Breschi A, Davis CA, Barja PP, Dobin A, Tanzer A, Lagarde J, Zaleski C, See LH, Fastuca M, Drenkow J, Wang H, Bussotti G, Pei B, Balasubramanian S, Monlong J, Harmanci A, Gerstein M, Beer MA, Notredame C, Guigó R, Gingeras TR - Nat Commun (2015)

Genes with constrained expression.(a) The distribution of the dynamic range (DNR, log10 of the ratio of the largest and the lowest non-zero observation) of gene expression level in orthologous genes across human and mouse samples. (b) Venn diagram of the relationship between orthologous and constrained genes. (c) Proportion of nucleotides in expressed genes, as assessed by PolyA+ RNA-seq, that originates from constrained genes in human cell lines and mouse tissues. The labelled outliers correspond to mouse embryonic samples. (d,e) The distribution of DNR in human/mouse constrained and unconstrained genes in Merkin et al.10 (d) and Barbosa-Morais et al.9 (e). (f) The joint distribution of log10 average gene reads per kilobase per million mapped reads (RPKM) in pairs of orthologous protein-coding genes; constrained genes are shown in red. (g) The distribution of promoter, transcript and protein pairwise sequence identity between human and mouse in constrained and unconstrained genes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4308717&req=5

f4: Genes with constrained expression.(a) The distribution of the dynamic range (DNR, log10 of the ratio of the largest and the lowest non-zero observation) of gene expression level in orthologous genes across human and mouse samples. (b) Venn diagram of the relationship between orthologous and constrained genes. (c) Proportion of nucleotides in expressed genes, as assessed by PolyA+ RNA-seq, that originates from constrained genes in human cell lines and mouse tissues. The labelled outliers correspond to mouse embryonic samples. (d,e) The distribution of DNR in human/mouse constrained and unconstrained genes in Merkin et al.10 (d) and Barbosa-Morais et al.9 (e). (f) The joint distribution of log10 average gene reads per kilobase per million mapped reads (RPKM) in pairs of orthologous protein-coding genes; constrained genes are shown in red. (g) The distribution of promoter, transcript and protein pairwise sequence identity between human and mouse in constrained and unconstrained genes.
Mentions: As previously reported827, we found that within a given cell population the levels of gene expression may vary up to six orders of magnitude (Supplementary Fig. 8). In contrast, however, the expression of a given gene across cell types varies relatively little. Using a two-factor analysis of variance (ANOVA), with gene and cell type as two factors, we found that the variance across genes accounts for 76% and 71% of the total variance of gene expression across human and mouse cell types, respectively, whereas the fraction of the variance that can be directly attributed to cell type is less than 1% (Supplementary Methods). Indeed, we found a large fraction of genes, the expression of which varies relatively little across tissues and species. In Fig. 4a, we computed the distribution of the dynamic range of gene expression (DNR) in orthologous genes across the entire set of human and mouse samples. For each gene expressed in at least two samples both in human and mouse, we computed the log10 ratio between the highest and lowest measured expression. The distribution is bimodal, uncovering two broad gene classes. This is not an effect of comparing mouse tissues with human cell lines, as the same pattern is obtained when using RNA-seq obtained in human tissues by the Illumina Body Map project (Supplementary Fig. 9). We decompose the distribution assuming two underlying Gaussians, and we took DNR=2 at the approximate intersection point (Supplementary Fig. 10 and Supplementary Methods). In this way, we obtained a set of 6,636 genes (~40% of all 15,736 orthologous genes, Fig. 4b), the expression of which remains relatively constant (that is, within two orders of magnitude) across species and cell types. Genes with constrained expression show wider expression breadth (Supplementary Fig. 11A) and less tissue specificity than the rest of the orthologues (less than 10% of tissue-specific orthologues are included in this set, Supplementary Methods). However, they can be eventually detected as differentially expressed at a rate similar to the rest of the orthologues (82% versus 89%). They also show higher expression levels (Supplementary Fig. 11B). Therefore, although they represent only about 17% of all annotated genes, they capture a high proportion of the polyA+ transcription in the cell (39% on AVG in human and 41% in mouse), a proportion that remains remarkably constant across all tested human and mouse cell types (Fig. 4c). Mouse embryonic samples are an exception, with constrained genes generating only about 20% of the cell’s transcriptional output. We also found a negative association between minimal expression and DNR (Supplementary Fig. 12). Thus, genes with constrained expression tend to have also higher minimal expression, suggesting that these genes in their default state may already be primed for transcription. To eliminate gene expression as a potential confounding factor, most downstream analyses have been carried out in a subset of 5,519 genes with constrained expression, and in a subset of identical size from the rest of the orthologous genes for which we could compute DNR, with matched expression in human and mouse (the unconstrained set, Supplementary Fig. 11C and Supplementary Methods).

Bottom Line: This core set of genes captures a substantial fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types.Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer.Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but is associated with conserved epigenetic marking, as well as with characteristic post-transcriptional regulatory programme, in which sub-cellular localization and alternative splicing play comparatively large roles.

View Article: PubMed Central - PubMed

Affiliation: 1] Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, Barcelona 08003, Spain [2] Faculty of Bioengineering and Bioinformatics, Moscow State University, Leninskie Gory 1-73, 119992 Moscow, Russia.

ABSTRACT
Mice have been a long-standing model for human biology and disease. Here we characterize, by RNA sequencing, the transcriptional profiles of a large and heterogeneous collection of mouse tissues, augmenting the mouse transcriptome with thousands of novel transcript candidates. Comparison with transcriptome profiles in human cell lines reveals substantial conservation of transcriptional programmes, and uncovers a distinct class of genes with levels of expression that have been constrained early in vertebrate evolution. This core set of genes captures a substantial fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types. Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer. Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but is associated with conserved epigenetic marking, as well as with characteristic post-transcriptional regulatory programme, in which sub-cellular localization and alternative splicing play comparatively large roles.

No MeSH data available.


Related in: MedlinePlus