Limits...
Computational RNA secondary structure design: empirical complexity and improved methods.

Aguirre-Hernández R, Hoos HH, Condon A - BMC Bioinformatics (2007)

Bottom Line: Such understanding provides the basis for improving the performance of one of the best algorithms for this problem, RNA-SSD, and for characterising its limitations.We also found that the algorithms are in general faster when constraints are placed only on paired bases in the structure.Our analysis helps to better understand the strengths and limitations of both the RNA-SSD and RNAinverse algorithms, and suggests ways in which the performance of these algorithms can be further improved.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Applied Mathematics, University of British Columbia, Vancouver, BC, Canada. rosalia@cs.ubc.ca <rosalia@cs.ubc.ca>

ABSTRACT

Background: We investigate the empirical complexity of the RNA secondary structure design problem, that is, the scaling of the typical difficulty of the design task for various classes of RNA structures as the size of the target structure is increased. The purpose of this work is to understand better the factors that make RNA structures hard to design for existing, high-performance algorithms. Such understanding provides the basis for improving the performance of one of the best algorithms for this problem, RNA-SSD, and for characterising its limitations.

Results: To gain insights into the practical complexity of the problem, we present a scaling analysis on random and biologically motivated structures using an improved version of the RNA-SSD algorithm, and also the RNAinverse algorithm from the Vienna package. Since primary structure constraints are relevant for designing RNA structures, we also investigate the correlation between the number and the location of the primary structure constraints when designing structures and the performance of the RNA-SSD algorithm. The scaling analysis on random and biologically motivated structures supports the hypothesis that the running time of both algorithms scales polynomially with the size of the structure. We also found that the algorithms are in general faster when constraints are placed only on paired bases in the structure. Furthermore, we prove that, according to the standard thermodynamic model, for some structures that the RNA-SSD algorithm was unable to design, there exists no sequence whose minimum free energy structure is the target structure.

Conclusion: Our analysis helps to better understand the strengths and limitations of both the RNA-SSD and RNAinverse algorithms, and suggests ways in which the performance of these algorithms can be further improved.

Show MeSH

Related in: MedlinePlus

Scaling analysis of RNA-SSD and RNAinverse. Scaling analysis of the expected run time (y-axis) of structures of lengths 50, 75, 100, 125, 150, 200 and 450 (x-axis). A logarithmic scale is used on both axes. The lines correspond to best fits of the data, for structures with lengths 50 to 150, using a polynomial that is specified in each case. The expected run time for structures longer than 150 appear close to the corresponding fit line. (a) Expected run time of RNA-SSD to design biological structures and median (Q50), 0.1-quantile (Q10) and 0.9-quantile (Q90) of expected run time for RNA-SSD applied to biologically motivated structures. (b) Median of expected run time of random and biologically motivated structures using RNA-SSD and RNAinverse. The structures of length 200 are the largest structures from the respective data sets that we designed with RNAinverse.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1808480&req=5

Figure 2: Scaling analysis of RNA-SSD and RNAinverse. Scaling analysis of the expected run time (y-axis) of structures of lengths 50, 75, 100, 125, 150, 200 and 450 (x-axis). A logarithmic scale is used on both axes. The lines correspond to best fits of the data, for structures with lengths 50 to 150, using a polynomial that is specified in each case. The expected run time for structures longer than 150 appear close to the corresponding fit line. (a) Expected run time of RNA-SSD to design biological structures and median (Q50), 0.1-quantile (Q10) and 0.9-quantile (Q90) of expected run time for RNA-SSD applied to biologically motivated structures. (b) Median of expected run time of random and biologically motivated structures using RNA-SSD and RNAinverse. The structures of length 200 are the largest structures from the respective data sets that we designed with RNAinverse.

Mentions: Figure 2a shows the median expected run time for different structure lengths (where the median is over the structures in a set and the expectation is over multiple runs of the algorithm on a given structure), as well as the expected run time for the structure at the 10% and the 90% quantile for the biologically motivated structures. We also show the expected run times for the set of real biological structures summarised in Table 2. Notice that the empirical complexity for designing these real structures fits well within the range of complexity observed for our biologically motivated sets of structures, which provides some evidence that the probabilistic model underlying these sets is reasonably plausible for the purposes of this study. Furthermore, the data in Figure 2a indicate that the expected run time of RNA-SSD scales polynomially with structure size for median difficulty as well as for the 10% and 90% quantiles of these structure distributions, where the degree of the polynomial is higher for higher quantiles.


Computational RNA secondary structure design: empirical complexity and improved methods.

Aguirre-Hernández R, Hoos HH, Condon A - BMC Bioinformatics (2007)

Scaling analysis of RNA-SSD and RNAinverse. Scaling analysis of the expected run time (y-axis) of structures of lengths 50, 75, 100, 125, 150, 200 and 450 (x-axis). A logarithmic scale is used on both axes. The lines correspond to best fits of the data, for structures with lengths 50 to 150, using a polynomial that is specified in each case. The expected run time for structures longer than 150 appear close to the corresponding fit line. (a) Expected run time of RNA-SSD to design biological structures and median (Q50), 0.1-quantile (Q10) and 0.9-quantile (Q90) of expected run time for RNA-SSD applied to biologically motivated structures. (b) Median of expected run time of random and biologically motivated structures using RNA-SSD and RNAinverse. The structures of length 200 are the largest structures from the respective data sets that we designed with RNAinverse.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1808480&req=5

Figure 2: Scaling analysis of RNA-SSD and RNAinverse. Scaling analysis of the expected run time (y-axis) of structures of lengths 50, 75, 100, 125, 150, 200 and 450 (x-axis). A logarithmic scale is used on both axes. The lines correspond to best fits of the data, for structures with lengths 50 to 150, using a polynomial that is specified in each case. The expected run time for structures longer than 150 appear close to the corresponding fit line. (a) Expected run time of RNA-SSD to design biological structures and median (Q50), 0.1-quantile (Q10) and 0.9-quantile (Q90) of expected run time for RNA-SSD applied to biologically motivated structures. (b) Median of expected run time of random and biologically motivated structures using RNA-SSD and RNAinverse. The structures of length 200 are the largest structures from the respective data sets that we designed with RNAinverse.
Mentions: Figure 2a shows the median expected run time for different structure lengths (where the median is over the structures in a set and the expectation is over multiple runs of the algorithm on a given structure), as well as the expected run time for the structure at the 10% and the 90% quantile for the biologically motivated structures. We also show the expected run times for the set of real biological structures summarised in Table 2. Notice that the empirical complexity for designing these real structures fits well within the range of complexity observed for our biologically motivated sets of structures, which provides some evidence that the probabilistic model underlying these sets is reasonably plausible for the purposes of this study. Furthermore, the data in Figure 2a indicate that the expected run time of RNA-SSD scales polynomially with structure size for median difficulty as well as for the 10% and 90% quantiles of these structure distributions, where the degree of the polynomial is higher for higher quantiles.

Bottom Line: Such understanding provides the basis for improving the performance of one of the best algorithms for this problem, RNA-SSD, and for characterising its limitations.We also found that the algorithms are in general faster when constraints are placed only on paired bases in the structure.Our analysis helps to better understand the strengths and limitations of both the RNA-SSD and RNAinverse algorithms, and suggests ways in which the performance of these algorithms can be further improved.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Applied Mathematics, University of British Columbia, Vancouver, BC, Canada. rosalia@cs.ubc.ca <rosalia@cs.ubc.ca>

ABSTRACT

Background: We investigate the empirical complexity of the RNA secondary structure design problem, that is, the scaling of the typical difficulty of the design task for various classes of RNA structures as the size of the target structure is increased. The purpose of this work is to understand better the factors that make RNA structures hard to design for existing, high-performance algorithms. Such understanding provides the basis for improving the performance of one of the best algorithms for this problem, RNA-SSD, and for characterising its limitations.

Results: To gain insights into the practical complexity of the problem, we present a scaling analysis on random and biologically motivated structures using an improved version of the RNA-SSD algorithm, and also the RNAinverse algorithm from the Vienna package. Since primary structure constraints are relevant for designing RNA structures, we also investigate the correlation between the number and the location of the primary structure constraints when designing structures and the performance of the RNA-SSD algorithm. The scaling analysis on random and biologically motivated structures supports the hypothesis that the running time of both algorithms scales polynomially with the size of the structure. We also found that the algorithms are in general faster when constraints are placed only on paired bases in the structure. Furthermore, we prove that, according to the standard thermodynamic model, for some structures that the RNA-SSD algorithm was unable to design, there exists no sequence whose minimum free energy structure is the target structure.

Conclusion: Our analysis helps to better understand the strengths and limitations of both the RNA-SSD and RNAinverse algorithms, and suggests ways in which the performance of these algorithms can be further improved.

Show MeSH
Related in: MedlinePlus