Limits...
The majority of transcripts in the squid nervous system are extensively recoded by A-to-I RNA editing.

Alon S, Garrett SC, Levanon EY, Olson S, Graveley BR, Rosenthal JJ, Eisenberg E - Elife (2015)

Bottom Line: These studies on few established models have led to the general assumption that recoding by RNA editing is extremely rare.Here we employ a novel bioinformatic approach with extensive validation to show that the squid Doryteuthis pealeii recodes proteins by RNA editing to an unprecedented extent.Recoding is tissue-dependent, and enriched in genes with neuronal and cytoskeletal functions, suggesting it plays an important role in brain physiology.

View Article: PubMed Central - PubMed

Affiliation: George S Wise Faculty of Life Sciences, Department of Neurobiology, Tel Aviv University, Tel Aviv, Israel.

ABSTRACT
RNA editing by adenosine deamination alters genetic information from the genomic blueprint. When it recodes mRNAs, it gives organisms the option to express diverse, functionally distinct, protein isoforms. All eumetazoans, from cnidarians to humans, express RNA editing enzymes. However, transcriptome-wide screens have only uncovered about 25 transcripts harboring conserved recoding RNA editing sites in mammals and several hundred recoding sites in Drosophila. These studies on few established models have led to the general assumption that recoding by RNA editing is extremely rare. Here we employ a novel bioinformatic approach with extensive validation to show that the squid Doryteuthis pealeii recodes proteins by RNA editing to an unprecedented extent. We identify 57,108 recoding sites in the nervous system, affecting the majority of the proteins studied. Recoding is tissue-dependent, and enriched in genes with neuronal and cytoskeletal functions, suggesting it plays an important role in brain physiology.

Show MeSH
Quality controls for the A-to-G modifications and the non A-to-Gmodifications.(A) The distribution of the quality scores for all the sitesused (all the positions inside all the analyzed reads), A-to-Gmodifications, and non A-to-G modifications. No difference is observedbetween these three groups. Note that sites with Q < 30 were excluded.(B) The number of mismatches detected as a function of theposition inside the read. Non A-to-G mismatches tend to occur at reads'ends, suggesting alignment artifacts (which tend to affect reads' ends)are responsible to some of these mismatches (Ramaswami et al., 2012). A-to-G mismatches do not showsuch tendency. (C) The distribution of modification levels forA-to-G and non A-to-G sites, for the GFL and OL tissues. The increasednumber of non A-to-G sites with ∼50% modification level hint at somegenomic polymorphisms (SNPs), that were not represented in our DNA reads dueto the limited coverage, are included among the non A-to-G mismatches.Consistently, 51% of the sites with non A-to-G modification levels between40–60% recur in both tissues (coming from the same individualanimal), compared to only 22% of the A-to-G modifications in the same range.Similarly, 50% of non A-to-G modification levels higher than 90% recur inboth tissues (coming from the same individual animal), compared to only 21%for A-to-G modifications in the same range. These two ranges are the onlyones in which such difference is observed. Abbreviations: Giant fiber lobe(GFL), Optic lobe (OL), quality score (Q).DOI:http://dx.doi.org/10.7554/eLife.05198.008
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4384741&req=5

fig2s4: Quality controls for the A-to-G modifications and the non A-to-Gmodifications.(A) The distribution of the quality scores for all the sitesused (all the positions inside all the analyzed reads), A-to-Gmodifications, and non A-to-G modifications. No difference is observedbetween these three groups. Note that sites with Q < 30 were excluded.(B) The number of mismatches detected as a function of theposition inside the read. Non A-to-G mismatches tend to occur at reads'ends, suggesting alignment artifacts (which tend to affect reads' ends)are responsible to some of these mismatches (Ramaswami et al., 2012). A-to-G mismatches do not showsuch tendency. (C) The distribution of modification levels forA-to-G and non A-to-G sites, for the GFL and OL tissues. The increasednumber of non A-to-G sites with ∼50% modification level hint at somegenomic polymorphisms (SNPs), that were not represented in our DNA reads dueto the limited coverage, are included among the non A-to-G mismatches.Consistently, 51% of the sites with non A-to-G modification levels between40–60% recur in both tissues (coming from the same individualanimal), compared to only 22% of the A-to-G modifications in the same range.Similarly, 50% of non A-to-G modification levels higher than 90% recur inboth tissues (coming from the same individual animal), compared to only 21%for A-to-G modifications in the same range. These two ranges are the onlyones in which such difference is observed. Abbreviations: Giant fiber lobe(GFL), Optic lobe (OL), quality score (Q).DOI:http://dx.doi.org/10.7554/eLife.05198.008

Mentions: Although the number of A-to-G discrepancies was unexpectedly large, subsequent analysessupport the idea that they are caused by RNA editing rather than other sources of error.First, we applied our pipeline to similarly sized data sets from a human blood sampleand from the rhesus macaque brain, each containing matching RNA and DNA sequence reads.As expected for mammals, the quantity of AG mismatches in coding regions were similar tothose from non-AG mismatches, and both were quantitatively indistinguishable from thenoise determined from the squid data (Figure 2Aand Supplementary file1A,B). These controls demonstrate that the enormous number of AG mismatches inthe squid data is not an artefact of our analysis pipeline. Other features point to thebiological origin of our AG mismatches. Similar to A-to-I editing sites in otherorganisms (Morse et al., 2002; Levanon et al., 2004; Kleinberger and Eisenberg, 2010), those identified here tend tocluster and show distinctive 5′ and 3′ neighbor preferences (Figure 2—figure supplement 1). In addition,hierarchical clustering of results from five tissues reveals that A-to-G modifications,but not other types, exhibit clear tissue-specificity, suggesting they do not resultfrom genomic polymorphisms and mapping artifacts (Figure 2—figure supplement 2). No A-to-G overrepresentation isobserved in mitochondria-encoded genes (Figure2A), in agreement with the absence of ADARs, and by extension A-to-I editing, inthe mitochondria. Finally, direct Sanger sequencing from a second individual confirmedediting at 40/40 A-to-G sites, and deep-sequencing validated 120/143 A-to-G sites butnone of the 12 non A-to-G sites tested (Figure2—figure supplement 3, Supplementary file 1C–G). Taken together, theoverrepresentation of A-to-G modifications over all other types, the motifs surroundingthe A-to-G sites, the tissue-specific modification levels, and the validationexperiments, provide evidence that the majority of the A-to-G modifications are trueediting events, while most non A-to-G modifications are likely technical artifacts orgenomic variations (Zaranek et al., 2010; Ramaswami et al., 2012) (Figure 2—figure supplement 4).


The majority of transcripts in the squid nervous system are extensively recoded by A-to-I RNA editing.

Alon S, Garrett SC, Levanon EY, Olson S, Graveley BR, Rosenthal JJ, Eisenberg E - Elife (2015)

Quality controls for the A-to-G modifications and the non A-to-Gmodifications.(A) The distribution of the quality scores for all the sitesused (all the positions inside all the analyzed reads), A-to-Gmodifications, and non A-to-G modifications. No difference is observedbetween these three groups. Note that sites with Q < 30 were excluded.(B) The number of mismatches detected as a function of theposition inside the read. Non A-to-G mismatches tend to occur at reads'ends, suggesting alignment artifacts (which tend to affect reads' ends)are responsible to some of these mismatches (Ramaswami et al., 2012). A-to-G mismatches do not showsuch tendency. (C) The distribution of modification levels forA-to-G and non A-to-G sites, for the GFL and OL tissues. The increasednumber of non A-to-G sites with ∼50% modification level hint at somegenomic polymorphisms (SNPs), that were not represented in our DNA reads dueto the limited coverage, are included among the non A-to-G mismatches.Consistently, 51% of the sites with non A-to-G modification levels between40–60% recur in both tissues (coming from the same individualanimal), compared to only 22% of the A-to-G modifications in the same range.Similarly, 50% of non A-to-G modification levels higher than 90% recur inboth tissues (coming from the same individual animal), compared to only 21%for A-to-G modifications in the same range. These two ranges are the onlyones in which such difference is observed. Abbreviations: Giant fiber lobe(GFL), Optic lobe (OL), quality score (Q).DOI:http://dx.doi.org/10.7554/eLife.05198.008
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4384741&req=5

fig2s4: Quality controls for the A-to-G modifications and the non A-to-Gmodifications.(A) The distribution of the quality scores for all the sitesused (all the positions inside all the analyzed reads), A-to-Gmodifications, and non A-to-G modifications. No difference is observedbetween these three groups. Note that sites with Q < 30 were excluded.(B) The number of mismatches detected as a function of theposition inside the read. Non A-to-G mismatches tend to occur at reads'ends, suggesting alignment artifacts (which tend to affect reads' ends)are responsible to some of these mismatches (Ramaswami et al., 2012). A-to-G mismatches do not showsuch tendency. (C) The distribution of modification levels forA-to-G and non A-to-G sites, for the GFL and OL tissues. The increasednumber of non A-to-G sites with ∼50% modification level hint at somegenomic polymorphisms (SNPs), that were not represented in our DNA reads dueto the limited coverage, are included among the non A-to-G mismatches.Consistently, 51% of the sites with non A-to-G modification levels between40–60% recur in both tissues (coming from the same individualanimal), compared to only 22% of the A-to-G modifications in the same range.Similarly, 50% of non A-to-G modification levels higher than 90% recur inboth tissues (coming from the same individual animal), compared to only 21%for A-to-G modifications in the same range. These two ranges are the onlyones in which such difference is observed. Abbreviations: Giant fiber lobe(GFL), Optic lobe (OL), quality score (Q).DOI:http://dx.doi.org/10.7554/eLife.05198.008
Mentions: Although the number of A-to-G discrepancies was unexpectedly large, subsequent analysessupport the idea that they are caused by RNA editing rather than other sources of error.First, we applied our pipeline to similarly sized data sets from a human blood sampleand from the rhesus macaque brain, each containing matching RNA and DNA sequence reads.As expected for mammals, the quantity of AG mismatches in coding regions were similar tothose from non-AG mismatches, and both were quantitatively indistinguishable from thenoise determined from the squid data (Figure 2Aand Supplementary file1A,B). These controls demonstrate that the enormous number of AG mismatches inthe squid data is not an artefact of our analysis pipeline. Other features point to thebiological origin of our AG mismatches. Similar to A-to-I editing sites in otherorganisms (Morse et al., 2002; Levanon et al., 2004; Kleinberger and Eisenberg, 2010), those identified here tend tocluster and show distinctive 5′ and 3′ neighbor preferences (Figure 2—figure supplement 1). In addition,hierarchical clustering of results from five tissues reveals that A-to-G modifications,but not other types, exhibit clear tissue-specificity, suggesting they do not resultfrom genomic polymorphisms and mapping artifacts (Figure 2—figure supplement 2). No A-to-G overrepresentation isobserved in mitochondria-encoded genes (Figure2A), in agreement with the absence of ADARs, and by extension A-to-I editing, inthe mitochondria. Finally, direct Sanger sequencing from a second individual confirmedediting at 40/40 A-to-G sites, and deep-sequencing validated 120/143 A-to-G sites butnone of the 12 non A-to-G sites tested (Figure2—figure supplement 3, Supplementary file 1C–G). Taken together, theoverrepresentation of A-to-G modifications over all other types, the motifs surroundingthe A-to-G sites, the tissue-specific modification levels, and the validationexperiments, provide evidence that the majority of the A-to-G modifications are trueediting events, while most non A-to-G modifications are likely technical artifacts orgenomic variations (Zaranek et al., 2010; Ramaswami et al., 2012) (Figure 2—figure supplement 4).

Bottom Line: These studies on few established models have led to the general assumption that recoding by RNA editing is extremely rare.Here we employ a novel bioinformatic approach with extensive validation to show that the squid Doryteuthis pealeii recodes proteins by RNA editing to an unprecedented extent.Recoding is tissue-dependent, and enriched in genes with neuronal and cytoskeletal functions, suggesting it plays an important role in brain physiology.

View Article: PubMed Central - PubMed

Affiliation: George S Wise Faculty of Life Sciences, Department of Neurobiology, Tel Aviv University, Tel Aviv, Israel.

ABSTRACT
RNA editing by adenosine deamination alters genetic information from the genomic blueprint. When it recodes mRNAs, it gives organisms the option to express diverse, functionally distinct, protein isoforms. All eumetazoans, from cnidarians to humans, express RNA editing enzymes. However, transcriptome-wide screens have only uncovered about 25 transcripts harboring conserved recoding RNA editing sites in mammals and several hundred recoding sites in Drosophila. These studies on few established models have led to the general assumption that recoding by RNA editing is extremely rare. Here we employ a novel bioinformatic approach with extensive validation to show that the squid Doryteuthis pealeii recodes proteins by RNA editing to an unprecedented extent. We identify 57,108 recoding sites in the nervous system, affecting the majority of the proteins studied. Recoding is tissue-dependent, and enriched in genes with neuronal and cytoskeletal functions, suggesting it plays an important role in brain physiology.

Show MeSH