Limits...
Identification of candidate regulatory sequences in mammalian 3' UTRs by statistical analysis of oligonucleotide distributions.

CorĂ  D, Di Cunto F, Caselle M, Provero P - BMC Bioinformatics (2007)

Bottom Line: The second method is based on the identification of oligonucleotides showing statistically significant strand asymmetry in their distribution in 3' UTRs.Both methods are able to identify many previously known binding sites located in 3'UTRs, and in particular seed regions of known miRNAs.Many new candidates are proposed for experimental verification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Dept of Theoretical Physics, University of Turin and INFN, Turin, Italy. cora@to.infn.it <cora@to.infn.it>

ABSTRACT

Background: 3' untranslated regions (3' UTRs) contain binding sites for many regulatory elements, and in particular for microRNAs (miRNAs). The importance of miRNA-mediated post-transcriptional regulation has become increasingly clear in the last few years.

Results: We propose two complementary approaches to the statistical analysis of oligonucleotide frequencies in mammalian 3' UTRs aimed at the identification of candidate binding sites for regulatory elements. The first method is based on the identification of sets of genes characterized by evolutionarily conserved overrepresentation of an oligonucleotide. The second method is based on the identification of oligonucleotides showing statistically significant strand asymmetry in their distribution in 3' UTRs.

Conclusion: Both methods are able to identify many previously known binding sites located in 3'UTRs, and in particular seed regions of known miRNAs. Many new candidates are proposed for experimental verification.

Show MeSH
Distribution of strand-asymmetry z-values in 3' UTR and upstream regions. The distribution of the absolute value of z, defined in the text as a measure of strand asymmetry among all possible 7-mers in 3' UTR regions(red) and in 3000 bp upstream of the TSS (grey).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1904458&req=5

Figure 2: Distribution of strand-asymmetry z-values in 3' UTR and upstream regions. The distribution of the absolute value of z, defined in the text as a measure of strand asymmetry among all possible 7-mers in 3' UTR regions(red) and in 3000 bp upstream of the TSS (grey).

Mentions: 214 oligos of length 7 showed strand asymmetry with Bonferroni-corrected P-value less than 0.01 in the human case, and 139 for the mouse. Of these, 113 were in common (compared to ~ 2 expected by chance): evolutionary conservation was thus recovered a posteriori, providing strong support for the biological relevance of the binding sites identified by the method. The lists of the 7-mers showing strand asymmetry in human and mouse are reported in the supplementary material [see Additional files 2 and 3 respectively]. As a control, we compared these results with those obtained by the same analysis on the genomic sequence lying upstream of the transcription start site (TSS) of annotated genes. Since these are not transcribed, we do not expect in this case significant deviations from randomness in the distribution of strand asymmetry. Fig. 2 shows the distribution of the z-values calculated on upstream regions of length 3000 bp and the same distribution for 3' UTR regions. As expected, in the case of upstream regions the distribution is much narrower. Indeed only five 7-mers showed significant strand asymmetry: given the difficulty of determining the TSS by automated annotation systems, these cases can probably be explained by the erroneous inclusion of sequence fragments that are actually transcribed.


Identification of candidate regulatory sequences in mammalian 3' UTRs by statistical analysis of oligonucleotide distributions.

CorĂ  D, Di Cunto F, Caselle M, Provero P - BMC Bioinformatics (2007)

Distribution of strand-asymmetry z-values in 3' UTR and upstream regions. The distribution of the absolute value of z, defined in the text as a measure of strand asymmetry among all possible 7-mers in 3' UTR regions(red) and in 3000 bp upstream of the TSS (grey).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1904458&req=5

Figure 2: Distribution of strand-asymmetry z-values in 3' UTR and upstream regions. The distribution of the absolute value of z, defined in the text as a measure of strand asymmetry among all possible 7-mers in 3' UTR regions(red) and in 3000 bp upstream of the TSS (grey).
Mentions: 214 oligos of length 7 showed strand asymmetry with Bonferroni-corrected P-value less than 0.01 in the human case, and 139 for the mouse. Of these, 113 were in common (compared to ~ 2 expected by chance): evolutionary conservation was thus recovered a posteriori, providing strong support for the biological relevance of the binding sites identified by the method. The lists of the 7-mers showing strand asymmetry in human and mouse are reported in the supplementary material [see Additional files 2 and 3 respectively]. As a control, we compared these results with those obtained by the same analysis on the genomic sequence lying upstream of the transcription start site (TSS) of annotated genes. Since these are not transcribed, we do not expect in this case significant deviations from randomness in the distribution of strand asymmetry. Fig. 2 shows the distribution of the z-values calculated on upstream regions of length 3000 bp and the same distribution for 3' UTR regions. As expected, in the case of upstream regions the distribution is much narrower. Indeed only five 7-mers showed significant strand asymmetry: given the difficulty of determining the TSS by automated annotation systems, these cases can probably be explained by the erroneous inclusion of sequence fragments that are actually transcribed.

Bottom Line: The second method is based on the identification of oligonucleotides showing statistically significant strand asymmetry in their distribution in 3' UTRs.Both methods are able to identify many previously known binding sites located in 3'UTRs, and in particular seed regions of known miRNAs.Many new candidates are proposed for experimental verification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Dept of Theoretical Physics, University of Turin and INFN, Turin, Italy. cora@to.infn.it <cora@to.infn.it>

ABSTRACT

Background: 3' untranslated regions (3' UTRs) contain binding sites for many regulatory elements, and in particular for microRNAs (miRNAs). The importance of miRNA-mediated post-transcriptional regulation has become increasingly clear in the last few years.

Results: We propose two complementary approaches to the statistical analysis of oligonucleotide frequencies in mammalian 3' UTRs aimed at the identification of candidate binding sites for regulatory elements. The first method is based on the identification of sets of genes characterized by evolutionarily conserved overrepresentation of an oligonucleotide. The second method is based on the identification of oligonucleotides showing statistically significant strand asymmetry in their distribution in 3' UTRs.

Conclusion: Both methods are able to identify many previously known binding sites located in 3'UTRs, and in particular seed regions of known miRNAs. Many new candidates are proposed for experimental verification.

Show MeSH