Limits...
RNAslider: a faster engine for consecutive windows folding and its application to the analysis of genomic folding asymmetry.

Horesh Y, Wexler Y, Lebenthal I, Ziv-Ukelson M, Unger R - BMC Bioinformatics (2009)

Bottom Line: Recently an O(NL2) solution for this problem has been described.Here, we describe and implement an O(NLpsi(L)) engine for the consecutive windows folding problem, where psi(L) is shown to converge to O(1) under the assumption of a standard probabilistic polymer folding model, yielding an O(L) speedup which is experimentally confirmed.This is implemented here as a software tool, called RNAslider, and applied to the scanning of long chromosomes, leading to the observation of features that are visible only on a large scale.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Physics of Complex Systems, The Weizmann Institute of Science, Rehovot, Israel. yair.horesh@weizmann.ac.il

ABSTRACT

Background: Scanning large genomes with a sliding window in search of locally stable RNA structures is a well motivated problem in bioinformatics. Given a predefined window size L and an RNA sequence S of size N (L < N), the consecutive windows folding problem is to compute the minimal free energy (MFE) for the folding of each of the L-sized substrings of S. The consecutive windows folding problem can be naively solved in O(NL3) by applying any of the classical cubic-time RNA folding algorithms to each of the N-L windows of size L. Recently an O(NL2) solution for this problem has been described.

Results: Here, we describe and implement an O(NLpsi(L)) engine for the consecutive windows folding problem, where psi(L) is shown to converge to O(1) under the assumption of a standard probabilistic polymer folding model, yielding an O(L) speedup which is experimentally confirmed. Using this tool, we note an intriguing directionality (5'-3' vs. 3'-5') folding bias, i.e. that the minimal free energy (MFE) of folding is higher in the native direction of the DNA than in the reverse direction of various genomic regions in several organisms including regions of the genomes that do not encode proteins or ncRNA. This bias largely emerges from the genomic dinucleotide bias which affects the MFE, however we see some variations in the folding bias in the different genomic regions when normalized to the dinucleotide bias. We also present results from calculating the MFE landscape of a mouse chromosome 1, characterizing the MFE of the long ncRNA molecules that reside in this chromosome.

Conclusion: The efficient consecutive windows folding engine described in this paper allows for genome wide scans for ncRNA molecules as well as large-scale statistics. This is implemented here as a software tool, called RNAslider, and applied to the scanning of long chromosomes, leading to the observation of features that are visible only on a large scale.

Show MeSH

Related in: MedlinePlus

The MFE landscape and the location of ncRNA molecules. The MFE landscape (using a sliding window of 2,000 nt) of a region of 300,000 nt from location 88,100,000 to 88,400,000 on mouse chromosome 1. There are four ncRNA molecules of length longer than 1,000 nt in this region. For three out of the four molecules there is a (negative) peak around the genomic location of the ncRNA molecules. Note however that this example is not typical, and there are many peaks in the MFE landscape that do not correspond to ncRNA molecules, and there are also ncRNA molecules that don't reside within or near MFE peaks. The figure was prepared using the Lightweight Genome Viewer available at .
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2682796&req=5

Figure 6: The MFE landscape and the location of ncRNA molecules. The MFE landscape (using a sliding window of 2,000 nt) of a region of 300,000 nt from location 88,100,000 to 88,400,000 on mouse chromosome 1. There are four ncRNA molecules of length longer than 1,000 nt in this region. For three out of the four molecules there is a (negative) peak around the genomic location of the ncRNA molecules. Note however that this example is not typical, and there are many peaks in the MFE landscape that do not correspond to ncRNA molecules, and there are also ncRNA molecules that don't reside within or near MFE peaks. The figure was prepared using the Lightweight Genome Viewer available at .

Mentions: The low MFE for windows associated with ncRNA applies to group averages and thus it is an observation on a statistical level. For some of the ncRNA molecules, there is a correspondence between (negative) peaks of the MFE landscape and the location of individual ncRNA molecules. Figure 6 shows the MFE landscape (using a sliding window of 2,000 nt) of a region of 300,000 nt on mouse chromosome 1, where four ncRNA molecules of length longer than 1,000 nt reside. For three out of the four molecules there is a peak in the genomic location of the ncRNA molecules. We must stress however that this example is not the rule; there are many peaks in the MFE landscape that do not correspond to ncRNA molecules, and there are ncRNA molecules that do not reside within or near MFE peaks.


RNAslider: a faster engine for consecutive windows folding and its application to the analysis of genomic folding asymmetry.

Horesh Y, Wexler Y, Lebenthal I, Ziv-Ukelson M, Unger R - BMC Bioinformatics (2009)

The MFE landscape and the location of ncRNA molecules. The MFE landscape (using a sliding window of 2,000 nt) of a region of 300,000 nt from location 88,100,000 to 88,400,000 on mouse chromosome 1. There are four ncRNA molecules of length longer than 1,000 nt in this region. For three out of the four molecules there is a (negative) peak around the genomic location of the ncRNA molecules. Note however that this example is not typical, and there are many peaks in the MFE landscape that do not correspond to ncRNA molecules, and there are also ncRNA molecules that don't reside within or near MFE peaks. The figure was prepared using the Lightweight Genome Viewer available at .
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2682796&req=5

Figure 6: The MFE landscape and the location of ncRNA molecules. The MFE landscape (using a sliding window of 2,000 nt) of a region of 300,000 nt from location 88,100,000 to 88,400,000 on mouse chromosome 1. There are four ncRNA molecules of length longer than 1,000 nt in this region. For three out of the four molecules there is a (negative) peak around the genomic location of the ncRNA molecules. Note however that this example is not typical, and there are many peaks in the MFE landscape that do not correspond to ncRNA molecules, and there are also ncRNA molecules that don't reside within or near MFE peaks. The figure was prepared using the Lightweight Genome Viewer available at .
Mentions: The low MFE for windows associated with ncRNA applies to group averages and thus it is an observation on a statistical level. For some of the ncRNA molecules, there is a correspondence between (negative) peaks of the MFE landscape and the location of individual ncRNA molecules. Figure 6 shows the MFE landscape (using a sliding window of 2,000 nt) of a region of 300,000 nt on mouse chromosome 1, where four ncRNA molecules of length longer than 1,000 nt reside. For three out of the four molecules there is a peak in the genomic location of the ncRNA molecules. We must stress however that this example is not the rule; there are many peaks in the MFE landscape that do not correspond to ncRNA molecules, and there are ncRNA molecules that do not reside within or near MFE peaks.

Bottom Line: Recently an O(NL2) solution for this problem has been described.Here, we describe and implement an O(NLpsi(L)) engine for the consecutive windows folding problem, where psi(L) is shown to converge to O(1) under the assumption of a standard probabilistic polymer folding model, yielding an O(L) speedup which is experimentally confirmed.This is implemented here as a software tool, called RNAslider, and applied to the scanning of long chromosomes, leading to the observation of features that are visible only on a large scale.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Physics of Complex Systems, The Weizmann Institute of Science, Rehovot, Israel. yair.horesh@weizmann.ac.il

ABSTRACT

Background: Scanning large genomes with a sliding window in search of locally stable RNA structures is a well motivated problem in bioinformatics. Given a predefined window size L and an RNA sequence S of size N (L < N), the consecutive windows folding problem is to compute the minimal free energy (MFE) for the folding of each of the L-sized substrings of S. The consecutive windows folding problem can be naively solved in O(NL3) by applying any of the classical cubic-time RNA folding algorithms to each of the N-L windows of size L. Recently an O(NL2) solution for this problem has been described.

Results: Here, we describe and implement an O(NLpsi(L)) engine for the consecutive windows folding problem, where psi(L) is shown to converge to O(1) under the assumption of a standard probabilistic polymer folding model, yielding an O(L) speedup which is experimentally confirmed. Using this tool, we note an intriguing directionality (5'-3' vs. 3'-5') folding bias, i.e. that the minimal free energy (MFE) of folding is higher in the native direction of the DNA than in the reverse direction of various genomic regions in several organisms including regions of the genomes that do not encode proteins or ncRNA. This bias largely emerges from the genomic dinucleotide bias which affects the MFE, however we see some variations in the folding bias in the different genomic regions when normalized to the dinucleotide bias. We also present results from calculating the MFE landscape of a mouse chromosome 1, characterizing the MFE of the long ncRNA molecules that reside in this chromosome.

Conclusion: The efficient consecutive windows folding engine described in this paper allows for genome wide scans for ncRNA molecules as well as large-scale statistics. This is implemented here as a software tool, called RNAslider, and applied to the scanning of long chromosomes, leading to the observation of features that are visible only on a large scale.

Show MeSH
Related in: MedlinePlus