Computational RNA secondary structure design: empirical complexity and improved methods.
Bottom Line:
Such understanding provides the basis for improving the performance of one of the best algorithms for this problem, RNA-SSD, and for characterising its limitations.We also found that the algorithms are in general faster when constraints are placed only on paired bases in the structure.Our analysis helps to better understand the strengths and limitations of both the RNA-SSD and RNAinverse algorithms, and suggests ways in which the performance of these algorithms can be further improved.
Affiliation: Institute of Applied Mathematics, University of British Columbia, Vancouver, BC, Canada. rosalia@cs.ubc.ca <rosalia@cs.ubc.ca>
ABSTRACT
Show MeSH
Background: We investigate the empirical complexity of the RNA secondary structure design problem, that is, the scaling of the typical difficulty of the design task for various classes of RNA structures as the size of the target structure is increased. The purpose of this work is to understand better the factors that make RNA structures hard to design for existing, high-performance algorithms. Such understanding provides the basis for improving the performance of one of the best algorithms for this problem, RNA-SSD, and for characterising its limitations. Results: To gain insights into the practical complexity of the problem, we present a scaling analysis on random and biologically motivated structures using an improved version of the RNA-SSD algorithm, and also the RNAinverse algorithm from the Vienna package. Since primary structure constraints are relevant for designing RNA structures, we also investigate the correlation between the number and the location of the primary structure constraints when designing structures and the performance of the RNA-SSD algorithm. The scaling analysis on random and biologically motivated structures supports the hypothesis that the running time of both algorithms scales polynomially with the size of the structure. We also found that the algorithms are in general faster when constraints are placed only on paired bases in the structure. Furthermore, we prove that, according to the standard thermodynamic model, for some structures that the RNA-SSD algorithm was unable to design, there exists no sequence whose minimum free energy structure is the target structure. Conclusion: Our analysis helps to better understand the strengths and limitations of both the RNA-SSD and RNAinverse algorithms, and suggests ways in which the performance of these algorithms can be further improved. |
Related In:
Results -
Collection
License getmorefigures.php?uid=PMC1808480&req=5
Mentions: In preliminary experiments, we found that some structures are very difficult to design by using the hierarchical decomposition of Andronescu et al. [9]. This is the case, for example, for structures that have two loops separated by a very short stem (see Figure 1a). Recall that after splitting a structure (which is always done at a multiloop [9]), it is necessary to connect the two free ends created by the split such that both resulting substructures have exactly two free ends. To create structural boundary conditions at the split points that are similar to those of the original structure, this connection is achieved by merging the free ends of one fragment with those of a static cap structure, which is a small hairpin loop of size four (consisting of four unpaired bases and five paired bases, see Figure 1b); furthermore, two unpaired bases are added to the free ends of the other fragment if it contains a bulge directly after the first base pair. (Note that the example structure in Figure 1a does not contain bulges; therefore, no unpaired bases are added after splitting, resulting in the fragment shown in Figure 1c). |
View Article: PubMed Central - HTML - PubMed
Affiliation: Institute of Applied Mathematics, University of British Columbia, Vancouver, BC, Canada. rosalia@cs.ubc.ca <rosalia@cs.ubc.ca>
Background: We investigate the empirical complexity of the RNA secondary structure design problem, that is, the scaling of the typical difficulty of the design task for various classes of RNA structures as the size of the target structure is increased. The purpose of this work is to understand better the factors that make RNA structures hard to design for existing, high-performance algorithms. Such understanding provides the basis for improving the performance of one of the best algorithms for this problem, RNA-SSD, and for characterising its limitations.
Results: To gain insights into the practical complexity of the problem, we present a scaling analysis on random and biologically motivated structures using an improved version of the RNA-SSD algorithm, and also the RNAinverse algorithm from the Vienna package. Since primary structure constraints are relevant for designing RNA structures, we also investigate the correlation between the number and the location of the primary structure constraints when designing structures and the performance of the RNA-SSD algorithm. The scaling analysis on random and biologically motivated structures supports the hypothesis that the running time of both algorithms scales polynomially with the size of the structure. We also found that the algorithms are in general faster when constraints are placed only on paired bases in the structure. Furthermore, we prove that, according to the standard thermodynamic model, for some structures that the RNA-SSD algorithm was unable to design, there exists no sequence whose minimum free energy structure is the target structure.
Conclusion: Our analysis helps to better understand the strengths and limitations of both the RNA-SSD and RNAinverse algorithms, and suggests ways in which the performance of these algorithms can be further improved.