Limits...
Genomic and transcriptional co-localization of protein-coding and long non-coding RNA pairs in the developing brain.

Ponjavic J, Oliver PL, Lunter G, Ponting CP - PLoS Genet. (2009)

Bottom Line: Of this set, those expressed in the brain are more frequently conserved and are significantly enriched with predicted RNA secondary structures.Furthermore, brain-expressed long ncRNAs are preferentially located adjacent to protein-coding genes that are (1) also expressed in the brain and (2) involved in transcriptional regulation or in nervous system development.We provide the full set of constrained long ncRNAs as an important experimental resource and present, for the first time, substantive and predictive criteria for prioritising long ncRNA and mRNA transcript pairs when investigating their biological functions and contributions to development and disease.

View Article: PubMed Central - PubMed

Affiliation: MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom.

ABSTRACT
Besides protein-coding mRNAs, eukaryotic transcriptomes include many long non-protein-coding RNAs (ncRNAs) of unknown function that are transcribed away from protein-coding loci. Here, we have identified 659 intergenic long ncRNAs whose genomic sequences individually exhibit evolutionary constraint, a hallmark of functionality. Of this set, those expressed in the brain are more frequently conserved and are significantly enriched with predicted RNA secondary structures. Furthermore, brain-expressed long ncRNAs are preferentially located adjacent to protein-coding genes that are (1) also expressed in the brain and (2) involved in transcriptional regulation or in nervous system development. This led us to the hypothesis that spatiotemporal co-expression of ncRNAs and nearby protein-coding genes represents a general phenomenon, a prediction that was confirmed subsequently by in situ hybridisation in developing and adult mouse brain. We provide the full set of constrained long ncRNAs as an important experimental resource and present, for the first time, substantive and predictive criteria for prioritising long ncRNA and mRNA transcript pairs when investigating their biological functions and contributions to development and disease.

Show MeSH

Related in: MedlinePlus

A set of 659 non-coding RNA (ncRNA) transcripts, where each exhibits evidence of constraint on nucleotide substitutions since the mouse-human last common ancestor, shows significant enrichments in sequence predicted to contain folded RNA structures.(A) An aggregated histogram showing 1,113 ncRNAs whose relative substitution rates () in mouse-human comparisons could be estimated reliably (see Materials and Methods). Each bin provides the number of ncRNAs whose relative substitution rate falls within a given () interval. Brain-expressed ncRNAs are indicated in blue, non-brain-expressed ncRNAs in red, and ncRNAs that exhibit significantly reduced substitution rates are represented as non-shaded bars. Of all ncRNAs with relative substitution rates between 0.9 and 1.0, 93% exhibit rates that are not significantly different from likely selectively neutral sequence and were, therefore, classified as non-constrained (shaded bars). (B) Evofold-predicted RNA secondary structures (red bars) and conserved sequence (of two types: either PhastCons multispecies conserved elements [MCSs; dark blue] or indel-purified segments [IPSs; light blue]) are each significantly enriched within constrained long ncRNAs. Such ncRNAs also tend to be depleted within segmentally duplicated (SDs; light green) and human copy number variable (CNVs; dark green) sequence. Checkmarks and crosses indicate whether there is evidence for long ncRNAs to be expressed in the brain and to show sequence constraint (see main text). The fold difference (X-axis) is shown on a log2-scale. An asterisk (*) indicates that a ncRNA set is significantly enriched/depleted in an annotation when compared with annotation densities in G+C-matched and randomly-sampled sequences (p<2×10−4).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2722021&req=5

pgen-1000617-g001: A set of 659 non-coding RNA (ncRNA) transcripts, where each exhibits evidence of constraint on nucleotide substitutions since the mouse-human last common ancestor, shows significant enrichments in sequence predicted to contain folded RNA structures.(A) An aggregated histogram showing 1,113 ncRNAs whose relative substitution rates () in mouse-human comparisons could be estimated reliably (see Materials and Methods). Each bin provides the number of ncRNAs whose relative substitution rate falls within a given () interval. Brain-expressed ncRNAs are indicated in blue, non-brain-expressed ncRNAs in red, and ncRNAs that exhibit significantly reduced substitution rates are represented as non-shaded bars. Of all ncRNAs with relative substitution rates between 0.9 and 1.0, 93% exhibit rates that are not significantly different from likely selectively neutral sequence and were, therefore, classified as non-constrained (shaded bars). (B) Evofold-predicted RNA secondary structures (red bars) and conserved sequence (of two types: either PhastCons multispecies conserved elements [MCSs; dark blue] or indel-purified segments [IPSs; light blue]) are each significantly enriched within constrained long ncRNAs. Such ncRNAs also tend to be depleted within segmentally duplicated (SDs; light green) and human copy number variable (CNVs; dark green) sequence. Checkmarks and crosses indicate whether there is evidence for long ncRNAs to be expressed in the brain and to show sequence constraint (see main text). The fold difference (X-axis) is shown on a log2-scale. An asterisk (*) indicates that a ncRNA set is significantly enriched/depleted in an annotation when compared with annotation densities in G+C-matched and randomly-sampled sequences (p<2×10−4).

Mentions: We started by analysing 3,122 long ncRNAs transcribed from intergenic regions (see Materials and Methods) that, when considered together, exhibit evolutionary constraint [15]. Among these ncRNAs, we then identified 659 long ncRNAs that individually show evidence of constraint (hereafter termed constrained long ncRNAs): individually, their mouse-human nucleotide substitution rate is significantly (p<2.5×10−2) suppressed relative to rates for neighbouring transposable elements (Figure 1A; see Materials and Methods). As expected from these suppressed rates, many of these constrained long ncRNA loci (for example, AK034244, AK034417, AK039739, and AK048867) are alignable to the genomes of more distantly-related species, such as chicken. Henceforth, we focus on these 659 constrained ncRNAs since they are more likely to be functional, and less likely to represent random transcriptional events. Indeed, this is consistent with constrained ncRNAs being more frequently supported by CAGE (Cap-analysis gene expression) tag evidence [1],[18] than are non-constrained ncRNAs (319/659, 48% versus 537/1932, 28%, respectively; p<10−4, χ2-test).


Genomic and transcriptional co-localization of protein-coding and long non-coding RNA pairs in the developing brain.

Ponjavic J, Oliver PL, Lunter G, Ponting CP - PLoS Genet. (2009)

A set of 659 non-coding RNA (ncRNA) transcripts, where each exhibits evidence of constraint on nucleotide substitutions since the mouse-human last common ancestor, shows significant enrichments in sequence predicted to contain folded RNA structures.(A) An aggregated histogram showing 1,113 ncRNAs whose relative substitution rates () in mouse-human comparisons could be estimated reliably (see Materials and Methods). Each bin provides the number of ncRNAs whose relative substitution rate falls within a given () interval. Brain-expressed ncRNAs are indicated in blue, non-brain-expressed ncRNAs in red, and ncRNAs that exhibit significantly reduced substitution rates are represented as non-shaded bars. Of all ncRNAs with relative substitution rates between 0.9 and 1.0, 93% exhibit rates that are not significantly different from likely selectively neutral sequence and were, therefore, classified as non-constrained (shaded bars). (B) Evofold-predicted RNA secondary structures (red bars) and conserved sequence (of two types: either PhastCons multispecies conserved elements [MCSs; dark blue] or indel-purified segments [IPSs; light blue]) are each significantly enriched within constrained long ncRNAs. Such ncRNAs also tend to be depleted within segmentally duplicated (SDs; light green) and human copy number variable (CNVs; dark green) sequence. Checkmarks and crosses indicate whether there is evidence for long ncRNAs to be expressed in the brain and to show sequence constraint (see main text). The fold difference (X-axis) is shown on a log2-scale. An asterisk (*) indicates that a ncRNA set is significantly enriched/depleted in an annotation when compared with annotation densities in G+C-matched and randomly-sampled sequences (p<2×10−4).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2722021&req=5

pgen-1000617-g001: A set of 659 non-coding RNA (ncRNA) transcripts, where each exhibits evidence of constraint on nucleotide substitutions since the mouse-human last common ancestor, shows significant enrichments in sequence predicted to contain folded RNA structures.(A) An aggregated histogram showing 1,113 ncRNAs whose relative substitution rates () in mouse-human comparisons could be estimated reliably (see Materials and Methods). Each bin provides the number of ncRNAs whose relative substitution rate falls within a given () interval. Brain-expressed ncRNAs are indicated in blue, non-brain-expressed ncRNAs in red, and ncRNAs that exhibit significantly reduced substitution rates are represented as non-shaded bars. Of all ncRNAs with relative substitution rates between 0.9 and 1.0, 93% exhibit rates that are not significantly different from likely selectively neutral sequence and were, therefore, classified as non-constrained (shaded bars). (B) Evofold-predicted RNA secondary structures (red bars) and conserved sequence (of two types: either PhastCons multispecies conserved elements [MCSs; dark blue] or indel-purified segments [IPSs; light blue]) are each significantly enriched within constrained long ncRNAs. Such ncRNAs also tend to be depleted within segmentally duplicated (SDs; light green) and human copy number variable (CNVs; dark green) sequence. Checkmarks and crosses indicate whether there is evidence for long ncRNAs to be expressed in the brain and to show sequence constraint (see main text). The fold difference (X-axis) is shown on a log2-scale. An asterisk (*) indicates that a ncRNA set is significantly enriched/depleted in an annotation when compared with annotation densities in G+C-matched and randomly-sampled sequences (p<2×10−4).
Mentions: We started by analysing 3,122 long ncRNAs transcribed from intergenic regions (see Materials and Methods) that, when considered together, exhibit evolutionary constraint [15]. Among these ncRNAs, we then identified 659 long ncRNAs that individually show evidence of constraint (hereafter termed constrained long ncRNAs): individually, their mouse-human nucleotide substitution rate is significantly (p<2.5×10−2) suppressed relative to rates for neighbouring transposable elements (Figure 1A; see Materials and Methods). As expected from these suppressed rates, many of these constrained long ncRNA loci (for example, AK034244, AK034417, AK039739, and AK048867) are alignable to the genomes of more distantly-related species, such as chicken. Henceforth, we focus on these 659 constrained ncRNAs since they are more likely to be functional, and less likely to represent random transcriptional events. Indeed, this is consistent with constrained ncRNAs being more frequently supported by CAGE (Cap-analysis gene expression) tag evidence [1],[18] than are non-constrained ncRNAs (319/659, 48% versus 537/1932, 28%, respectively; p<10−4, χ2-test).

Bottom Line: Of this set, those expressed in the brain are more frequently conserved and are significantly enriched with predicted RNA secondary structures.Furthermore, brain-expressed long ncRNAs are preferentially located adjacent to protein-coding genes that are (1) also expressed in the brain and (2) involved in transcriptional regulation or in nervous system development.We provide the full set of constrained long ncRNAs as an important experimental resource and present, for the first time, substantive and predictive criteria for prioritising long ncRNA and mRNA transcript pairs when investigating their biological functions and contributions to development and disease.

View Article: PubMed Central - PubMed

Affiliation: MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom.

ABSTRACT
Besides protein-coding mRNAs, eukaryotic transcriptomes include many long non-protein-coding RNAs (ncRNAs) of unknown function that are transcribed away from protein-coding loci. Here, we have identified 659 intergenic long ncRNAs whose genomic sequences individually exhibit evolutionary constraint, a hallmark of functionality. Of this set, those expressed in the brain are more frequently conserved and are significantly enriched with predicted RNA secondary structures. Furthermore, brain-expressed long ncRNAs are preferentially located adjacent to protein-coding genes that are (1) also expressed in the brain and (2) involved in transcriptional regulation or in nervous system development. This led us to the hypothesis that spatiotemporal co-expression of ncRNAs and nearby protein-coding genes represents a general phenomenon, a prediction that was confirmed subsequently by in situ hybridisation in developing and adult mouse brain. We provide the full set of constrained long ncRNAs as an important experimental resource and present, for the first time, substantive and predictive criteria for prioritising long ncRNA and mRNA transcript pairs when investigating their biological functions and contributions to development and disease.

Show MeSH
Related in: MedlinePlus