Limits...
Estimating the fraction of non-coding RNAs in mammalian transcriptomes.

Xin Y, Quarta G, Gan HH, Schlick T - Bioinform Biol Insights (2008)

Bottom Line: Recent studies of mammalian transcriptomes have identified numerous RNA transcripts that do not code for proteins; their identity, however, is largely unknown.Here we explore an approach based on sequence randomness patterns to discern different RNA classes.We use this model to analyze six representative datasets identified by the FANTOM3 project and two computational approaches based on comparative analysis (RNAz and EvoFold).

View Article: PubMed Central - PubMed

Affiliation: Department of Chemistry, 251 Mercer Street, New York University, New York, NY 10012, USA.

ABSTRACT
Recent studies of mammalian transcriptomes have identified numerous RNA transcripts that do not code for proteins; their identity, however, is largely unknown. Here we explore an approach based on sequence randomness patterns to discern different RNA classes. The relative z-score we use helps identify the known ncRNA class from the genome, intergene and intron classes. This leads us to a fractional ncRNA measure of putative ncRNA datasets which we model as a mixture of genuine ncRNAs and other transcripts derived from genomic, intergenic and intronic sequences. We use this model to analyze six representative datasets identified by the FANTOM3 project and two computational approaches based on comparative analysis (RNAz and EvoFold). Our analysis suggests fewer ncRNAs than estimated by DNA sequencing and comparative analysis, but the verity of our approach and its prediction requires more extensive experimental RNA data.

No MeSH data available.


Related in: MedlinePlus

The degree of randomness of the six putative ncRNA datasets measured by the DNA test. The relative z-score distribution of the six datasets is denoted as follows: (a) EvoFold, (b) RNAz set2.P0.5, (c) FANTOM3 putative, (d) RNAz set1.P0.5, (e) FANTOM3 stringent and (f) RNAz set1.P0.9.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC2735967&req=5

f4-bbi-2008-075: The degree of randomness of the six putative ncRNA datasets measured by the DNA test. The relative z-score distribution of the six datasets is denoted as follows: (a) EvoFold, (b) RNAz set2.P0.5, (c) FANTOM3 putative, (d) RNAz set1.P0.5, (e) FANTOM3 stringent and (f) RNAz set1.P0.9.

Mentions: We now assess the six putative ncRNA datasets listed in Table 2 using the relative z-score. The total length of the EvoFold dataset is 1,869,205 nt which is shorter than the required length (2,097,152 nt) of the DNA test (Marsaglia and Zaman, 1993), so randomly selected ncRNAs are added to reach the length requirement. The relative z-score (1.437) of this “pseudo” EvoFold dataset is an estimate of the true value. Another “pseudo” EvoFold dataset created with additional genomic sequences has almost the same relative z-score (1.436). The DNA test result shows that none of the six datasets have a relative z-score close to the ncRNA class (Fig. 4). Instead, the six datasets form non-overlapping relative z-score distributions which fall in the genome/intergene/intron cluster. In order of decreasing degree of randomness, we have EvoFold, RNAz set2.P0.5, FANTOM3 putative, RNAz set1. P0.5, FANTOM3 stringent, and RNAz set1.P0.9.


Estimating the fraction of non-coding RNAs in mammalian transcriptomes.

Xin Y, Quarta G, Gan HH, Schlick T - Bioinform Biol Insights (2008)

The degree of randomness of the six putative ncRNA datasets measured by the DNA test. The relative z-score distribution of the six datasets is denoted as follows: (a) EvoFold, (b) RNAz set2.P0.5, (c) FANTOM3 putative, (d) RNAz set1.P0.5, (e) FANTOM3 stringent and (f) RNAz set1.P0.9.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC2735967&req=5

f4-bbi-2008-075: The degree of randomness of the six putative ncRNA datasets measured by the DNA test. The relative z-score distribution of the six datasets is denoted as follows: (a) EvoFold, (b) RNAz set2.P0.5, (c) FANTOM3 putative, (d) RNAz set1.P0.5, (e) FANTOM3 stringent and (f) RNAz set1.P0.9.
Mentions: We now assess the six putative ncRNA datasets listed in Table 2 using the relative z-score. The total length of the EvoFold dataset is 1,869,205 nt which is shorter than the required length (2,097,152 nt) of the DNA test (Marsaglia and Zaman, 1993), so randomly selected ncRNAs are added to reach the length requirement. The relative z-score (1.437) of this “pseudo” EvoFold dataset is an estimate of the true value. Another “pseudo” EvoFold dataset created with additional genomic sequences has almost the same relative z-score (1.436). The DNA test result shows that none of the six datasets have a relative z-score close to the ncRNA class (Fig. 4). Instead, the six datasets form non-overlapping relative z-score distributions which fall in the genome/intergene/intron cluster. In order of decreasing degree of randomness, we have EvoFold, RNAz set2.P0.5, FANTOM3 putative, RNAz set1. P0.5, FANTOM3 stringent, and RNAz set1.P0.9.

Bottom Line: Recent studies of mammalian transcriptomes have identified numerous RNA transcripts that do not code for proteins; their identity, however, is largely unknown.Here we explore an approach based on sequence randomness patterns to discern different RNA classes.We use this model to analyze six representative datasets identified by the FANTOM3 project and two computational approaches based on comparative analysis (RNAz and EvoFold).

View Article: PubMed Central - PubMed

Affiliation: Department of Chemistry, 251 Mercer Street, New York University, New York, NY 10012, USA.

ABSTRACT
Recent studies of mammalian transcriptomes have identified numerous RNA transcripts that do not code for proteins; their identity, however, is largely unknown. Here we explore an approach based on sequence randomness patterns to discern different RNA classes. The relative z-score we use helps identify the known ncRNA class from the genome, intergene and intron classes. This leads us to a fractional ncRNA measure of putative ncRNA datasets which we model as a mixture of genuine ncRNAs and other transcripts derived from genomic, intergenic and intronic sequences. We use this model to analyze six representative datasets identified by the FANTOM3 project and two computational approaches based on comparative analysis (RNAz and EvoFold). Our analysis suggests fewer ncRNAs than estimated by DNA sequencing and comparative analysis, but the verity of our approach and its prediction requires more extensive experimental RNA data.

No MeSH data available.


Related in: MedlinePlus