Limits...
Long noncoding RNA genes: conservation of sequence and brain expression among diverse amniotes.

Chodroff RA, Goodstadt L, Sirey TM, Oliver PL, Davies KE, Green ED, Molnár Z, Ponting CP - Genome Biol. (2010)

Bottom Line: In contrast to conventional protein-coding genes, the sequences, transcriptional start sites, exon structures, and lengths for these non-coding genes are all highly variable.The biological relevance of lncRNAs would be highly questionable if they were limited to closely related phyla.Instead, their preservation across diverse amniotes, their apparent conservation in exon structure, and similarities in their pattern of brain expression during embryonic and early postnatal stages together indicate that these are functional RNA molecules, of which some have roles in vertebrate brain development.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Physiology, Anatomy, and Genetics, Le Gros Clark Building South Parks Road, University of Oxford, Oxford OX1 3QX, UK.

ABSTRACT

Background: Long considered to be the building block of life, it is now apparent that protein is only one of many functional products generated by the eukaryotic genome. Indeed, more of the human genome is transcribed into noncoding sequence than into protein-coding sequence. Nevertheless, whilst we have developed a deep understanding of the relationships between evolutionary constraint and function for protein-coding sequence, little is known about these relationships for non-coding transcribed sequence. This dearth of information is partially attributable to a lack of established non-protein-coding RNA (ncRNA) orthologs among birds and mammals within sequence and expression databases.

Results: Here, we performed a multi-disciplinary study of four highly conserved and brain-expressed transcripts selected from a list of mouse long intergenic noncoding RNA (lncRNA) loci that generally show pronounced evolutionary constraint within their putative promoter regions and across exon-intron boundaries. We identify some of the first lncRNA orthologs present in birds (chicken), marsupial (opossum), and eutherian mammals (mouse), and investigate whether they exhibit conservation of brain expression. In contrast to conventional protein-coding genes, the sequences, transcriptional start sites, exon structures, and lengths for these non-coding genes are all highly variable.

Conclusions: The biological relevance of lncRNAs would be highly questionable if they were limited to closely related phyla. Instead, their preservation across diverse amniotes, their apparent conservation in exon structure, and similarities in their pattern of brain expression during embryonic and early postnatal stages together indicate that these are functional RNA molecules, of which some have roles in vertebrate brain development.

Show MeSH

Related in: MedlinePlus

Evolutionary constraint of AK082072. (a) The genomic region of mouse chromosome 13 (chr13) encompassing lncRNA AK082072 (523 bp) is depicted. Note the locations of the flanking protein-coding genes: Tmem161b (transmembrane protein 161b) and Mef2C (myocyte enhancer factor 2C). (b) A more detailed representation of AK082072 (exons highlighted in orange) and its immediate flanking regions. Below the gene structures are the positions of H3K4me3 chromatin marks (green) detected in mouse brain, VISTA conserved non-coding midbrain enhancer element 268 (obtained from the UCSC Genome Browser), and a BLAT alignment of the chicken AK082072 ortholog, as well as similar tracks as those in Figure 2b. Note the detected homology with orthologous frog sequence in exon 1. (c) Conservation and relative sizes of AK082072 orthologs in various species. Note the sequence conservation (relative to the mouse sequence) at both the 5' and 3' ends and the conserved position of splice sites (green). Unlike the other vertebrate genomes considered, the zebra finch genome did not align to the proximal promoter or first exon of mouse AK082072. This apparent lack of sequence identity might reflect either an unannotated gap in its genome assembly or rapidly evolving sequence within its orthologous genomic region. Other details are provided in the legend to Figure 2. ECR, evolutionarily conserved region.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2926783&req=5

Figure 3: Evolutionary constraint of AK082072. (a) The genomic region of mouse chromosome 13 (chr13) encompassing lncRNA AK082072 (523 bp) is depicted. Note the locations of the flanking protein-coding genes: Tmem161b (transmembrane protein 161b) and Mef2C (myocyte enhancer factor 2C). (b) A more detailed representation of AK082072 (exons highlighted in orange) and its immediate flanking regions. Below the gene structures are the positions of H3K4me3 chromatin marks (green) detected in mouse brain, VISTA conserved non-coding midbrain enhancer element 268 (obtained from the UCSC Genome Browser), and a BLAT alignment of the chicken AK082072 ortholog, as well as similar tracks as those in Figure 2b. Note the detected homology with orthologous frog sequence in exon 1. (c) Conservation and relative sizes of AK082072 orthologs in various species. Note the sequence conservation (relative to the mouse sequence) at both the 5' and 3' ends and the conserved position of splice sites (green). Unlike the other vertebrate genomes considered, the zebra finch genome did not align to the proximal promoter or first exon of mouse AK082072. This apparent lack of sequence identity might reflect either an unannotated gap in its genome assembly or rapidly evolving sequence within its orthologous genomic region. Other details are provided in the legend to Figure 2. ECR, evolutionarily conserved region.

Mentions: The three selected lncRNA loci harbor elements that are more usually associated with protein-coding genes. These include GT-AG donor-acceptor splice sites, polyadenylation signals, and chromatin marks in their putative promoter regions (Figures 2b,c, 3b,c and 4b,c; Figure S1 in Additional file 1). Aceview annotations [33] indicate an unspliced (single exon) transcript and single promoter for the AK043754 locus (spanning 1.75 kb on mouse chromosome 6qG1), a single canonical GT-AG intron and promoter for the AK082072 locus (39.7 kb on mouse chromosome 13qC3), and 31 different GT-AG introns in at least 16 different mRNA splice variants and 6 probable alternative promoters for the AK082467 locus (94 kb on mouse chromosome 10qC2). Each lncRNA sequence is supported by several GenBank cDNA records, representing cDNAs derived primarily from mouse embryonic or neonatal central nervous system tissues, including hypothalamus, diencephalon, cortex, cerebellum, and spinal cord. Many of the supporting GenBank records additionally support poly(A) and 5' cap structures, indicating that each lncRNA is most likely transcribed by RNA polymerase II. Chromatin marks from either mouse embryonic stem cells or adult mouse whole brain [34] are present at each putative lncRNA promoter (Figures 2b, 3b and 4b).


Long noncoding RNA genes: conservation of sequence and brain expression among diverse amniotes.

Chodroff RA, Goodstadt L, Sirey TM, Oliver PL, Davies KE, Green ED, Molnár Z, Ponting CP - Genome Biol. (2010)

Evolutionary constraint of AK082072. (a) The genomic region of mouse chromosome 13 (chr13) encompassing lncRNA AK082072 (523 bp) is depicted. Note the locations of the flanking protein-coding genes: Tmem161b (transmembrane protein 161b) and Mef2C (myocyte enhancer factor 2C). (b) A more detailed representation of AK082072 (exons highlighted in orange) and its immediate flanking regions. Below the gene structures are the positions of H3K4me3 chromatin marks (green) detected in mouse brain, VISTA conserved non-coding midbrain enhancer element 268 (obtained from the UCSC Genome Browser), and a BLAT alignment of the chicken AK082072 ortholog, as well as similar tracks as those in Figure 2b. Note the detected homology with orthologous frog sequence in exon 1. (c) Conservation and relative sizes of AK082072 orthologs in various species. Note the sequence conservation (relative to the mouse sequence) at both the 5' and 3' ends and the conserved position of splice sites (green). Unlike the other vertebrate genomes considered, the zebra finch genome did not align to the proximal promoter or first exon of mouse AK082072. This apparent lack of sequence identity might reflect either an unannotated gap in its genome assembly or rapidly evolving sequence within its orthologous genomic region. Other details are provided in the legend to Figure 2. ECR, evolutionarily conserved region.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2926783&req=5

Figure 3: Evolutionary constraint of AK082072. (a) The genomic region of mouse chromosome 13 (chr13) encompassing lncRNA AK082072 (523 bp) is depicted. Note the locations of the flanking protein-coding genes: Tmem161b (transmembrane protein 161b) and Mef2C (myocyte enhancer factor 2C). (b) A more detailed representation of AK082072 (exons highlighted in orange) and its immediate flanking regions. Below the gene structures are the positions of H3K4me3 chromatin marks (green) detected in mouse brain, VISTA conserved non-coding midbrain enhancer element 268 (obtained from the UCSC Genome Browser), and a BLAT alignment of the chicken AK082072 ortholog, as well as similar tracks as those in Figure 2b. Note the detected homology with orthologous frog sequence in exon 1. (c) Conservation and relative sizes of AK082072 orthologs in various species. Note the sequence conservation (relative to the mouse sequence) at both the 5' and 3' ends and the conserved position of splice sites (green). Unlike the other vertebrate genomes considered, the zebra finch genome did not align to the proximal promoter or first exon of mouse AK082072. This apparent lack of sequence identity might reflect either an unannotated gap in its genome assembly or rapidly evolving sequence within its orthologous genomic region. Other details are provided in the legend to Figure 2. ECR, evolutionarily conserved region.
Mentions: The three selected lncRNA loci harbor elements that are more usually associated with protein-coding genes. These include GT-AG donor-acceptor splice sites, polyadenylation signals, and chromatin marks in their putative promoter regions (Figures 2b,c, 3b,c and 4b,c; Figure S1 in Additional file 1). Aceview annotations [33] indicate an unspliced (single exon) transcript and single promoter for the AK043754 locus (spanning 1.75 kb on mouse chromosome 6qG1), a single canonical GT-AG intron and promoter for the AK082072 locus (39.7 kb on mouse chromosome 13qC3), and 31 different GT-AG introns in at least 16 different mRNA splice variants and 6 probable alternative promoters for the AK082467 locus (94 kb on mouse chromosome 10qC2). Each lncRNA sequence is supported by several GenBank cDNA records, representing cDNAs derived primarily from mouse embryonic or neonatal central nervous system tissues, including hypothalamus, diencephalon, cortex, cerebellum, and spinal cord. Many of the supporting GenBank records additionally support poly(A) and 5' cap structures, indicating that each lncRNA is most likely transcribed by RNA polymerase II. Chromatin marks from either mouse embryonic stem cells or adult mouse whole brain [34] are present at each putative lncRNA promoter (Figures 2b, 3b and 4b).

Bottom Line: In contrast to conventional protein-coding genes, the sequences, transcriptional start sites, exon structures, and lengths for these non-coding genes are all highly variable.The biological relevance of lncRNAs would be highly questionable if they were limited to closely related phyla.Instead, their preservation across diverse amniotes, their apparent conservation in exon structure, and similarities in their pattern of brain expression during embryonic and early postnatal stages together indicate that these are functional RNA molecules, of which some have roles in vertebrate brain development.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Physiology, Anatomy, and Genetics, Le Gros Clark Building South Parks Road, University of Oxford, Oxford OX1 3QX, UK.

ABSTRACT

Background: Long considered to be the building block of life, it is now apparent that protein is only one of many functional products generated by the eukaryotic genome. Indeed, more of the human genome is transcribed into noncoding sequence than into protein-coding sequence. Nevertheless, whilst we have developed a deep understanding of the relationships between evolutionary constraint and function for protein-coding sequence, little is known about these relationships for non-coding transcribed sequence. This dearth of information is partially attributable to a lack of established non-protein-coding RNA (ncRNA) orthologs among birds and mammals within sequence and expression databases.

Results: Here, we performed a multi-disciplinary study of four highly conserved and brain-expressed transcripts selected from a list of mouse long intergenic noncoding RNA (lncRNA) loci that generally show pronounced evolutionary constraint within their putative promoter regions and across exon-intron boundaries. We identify some of the first lncRNA orthologs present in birds (chicken), marsupial (opossum), and eutherian mammals (mouse), and investigate whether they exhibit conservation of brain expression. In contrast to conventional protein-coding genes, the sequences, transcriptional start sites, exon structures, and lengths for these non-coding genes are all highly variable.

Conclusions: The biological relevance of lncRNAs would be highly questionable if they were limited to closely related phyla. Instead, their preservation across diverse amniotes, their apparent conservation in exon structure, and similarities in their pattern of brain expression during embryonic and early postnatal stages together indicate that these are functional RNA molecules, of which some have roles in vertebrate brain development.

Show MeSH
Related in: MedlinePlus