Limits...
Free energy estimation of short DNA duplex hybridizations.

Tulpan D, Andronescu M, Leger S - BMC Bioinformatics (2010)

Bottom Line: Sequence lengths range from 4 to 30 nucleotides and span a large GC-content percentage range.For perfect matches, we propose an extension of the Nearest-Neighbour Model that matches or exceeds the performance of the existing ones, both in terms of correlations and root mean squared errors.Based on our preliminary results, we conclude that no statistically significant differences exist among free energy approximations obtained with 4 publicly available and widely used programs, when benchmarked against a collection of 695 pairs of short oligos collected and curated by the authors of this work based on 29 publications.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Research Council of Canada, Institute of Information Technology, 100 des Aboiteaux Street, Suite 1100, Moncton, NB E1A7R1, Canada. dan.tulpan@nrc-cnrc.gc.ca

ABSTRACT

Background: Estimation of DNA duplex hybridization free energy is widely used for predicting cross-hybridizations in DNA computing and microarray experiments. A number of software programs based on different methods and parametrizations are available for the theoretical estimation of duplex free energies. However, significant differences in free energy values are sometimes observed among estimations obtained with various methods, thus being difficult to decide what value is the accurate one.

Results: We present in this study a quantitative comparison of the similarities and differences among four published DNA/DNA duplex free energy calculation methods and an extended Nearest-Neighbour Model for perfect matches based on triplet interactions. The comparison was performed on a benchmark data set with 695 pairs of short oligos that we collected and manually curated from 29 publications. Sequence lengths range from 4 to 30 nucleotides and span a large GC-content percentage range. For perfect matches, we propose an extension of the Nearest-Neighbour Model that matches or exceeds the performance of the existing ones, both in terms of correlations and root mean squared errors. The proposed model was trained on experimental data with temperature, sodium and sequence concentration characteristics that span a wide range of values, thus conferring the model a higher power of generalization when used for free energy estimations of DNA duplexes under non-standard experimental conditions.

Conclusions: Based on our preliminary results, we conclude that no statistically significant differences exist among free energy approximations obtained with 4 publicly available and widely used programs, when benchmarked against a collection of 695 pairs of short oligos collected and curated by the authors of this work based on 29 publications. The extended Nearest-Neighbour Model based on triplet interactions presented in this work is capable of performing accurate estimations of free energies for perfect match duplexes under both standard and non-standard experimental conditions and may serve as a baseline for further developments in this area of research.

Show MeSH

Related in: MedlinePlus

Correlation plot for the evaluation of secondary structure predictions (EVAL-SS) obtained with MultiRNAFold (with SantaLucia parameters) versus experimental free energies. The correlation of free energies for predicted secondary structures for all 695 DNA duplexes are represented. The plot depicts with different symbols and colors the source for each data point.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2837027&req=5

Figure 2: Correlation plot for the evaluation of secondary structure predictions (EVAL-SS) obtained with MultiRNAFold (with SantaLucia parameters) versus experimental free energies. The correlation of free energies for predicted secondary structures for all 695 DNA duplexes are represented. The plot depicts with different symbols and colors the source for each data point.

Mentions: A correlation coefficient is traditionally defined as a symmetric, scale-invariant measure of association between two random variables, which takes values between -1 and 1. The extreme values indicate a perfect positive (1) or negative (-1) correlation, while 0 means no correlation. Positive Pearson Product Moment correlations are observed for all methods when experimental and evaluated or predicted free energies are considered as random variables. The highest Pearson correlation coefficients (~ .75 and ~ .77) are consistently obtained with the PairFold-SantaLucia method for both EVAL-FE and EVAL-SS, closely followed by UNAfold, Vienna Package and PairFold-Mathews. A major and consistent deviation from the correlation line of approximately 8 Kcal/mol for the data collected from Doktycz et al. [19] and a few other minor deviations for the data collected from four additional publications [20-23] were consistently noticed for all free energy calculation methods (see Figures 1 and 2). The majority of the deviations (e.g. Doktycz et al. [19]) may come from potentially different free energy interpolation functions used in those studies.


Free energy estimation of short DNA duplex hybridizations.

Tulpan D, Andronescu M, Leger S - BMC Bioinformatics (2010)

Correlation plot for the evaluation of secondary structure predictions (EVAL-SS) obtained with MultiRNAFold (with SantaLucia parameters) versus experimental free energies. The correlation of free energies for predicted secondary structures for all 695 DNA duplexes are represented. The plot depicts with different symbols and colors the source for each data point.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2837027&req=5

Figure 2: Correlation plot for the evaluation of secondary structure predictions (EVAL-SS) obtained with MultiRNAFold (with SantaLucia parameters) versus experimental free energies. The correlation of free energies for predicted secondary structures for all 695 DNA duplexes are represented. The plot depicts with different symbols and colors the source for each data point.
Mentions: A correlation coefficient is traditionally defined as a symmetric, scale-invariant measure of association between two random variables, which takes values between -1 and 1. The extreme values indicate a perfect positive (1) or negative (-1) correlation, while 0 means no correlation. Positive Pearson Product Moment correlations are observed for all methods when experimental and evaluated or predicted free energies are considered as random variables. The highest Pearson correlation coefficients (~ .75 and ~ .77) are consistently obtained with the PairFold-SantaLucia method for both EVAL-FE and EVAL-SS, closely followed by UNAfold, Vienna Package and PairFold-Mathews. A major and consistent deviation from the correlation line of approximately 8 Kcal/mol for the data collected from Doktycz et al. [19] and a few other minor deviations for the data collected from four additional publications [20-23] were consistently noticed for all free energy calculation methods (see Figures 1 and 2). The majority of the deviations (e.g. Doktycz et al. [19]) may come from potentially different free energy interpolation functions used in those studies.

Bottom Line: Sequence lengths range from 4 to 30 nucleotides and span a large GC-content percentage range.For perfect matches, we propose an extension of the Nearest-Neighbour Model that matches or exceeds the performance of the existing ones, both in terms of correlations and root mean squared errors.Based on our preliminary results, we conclude that no statistically significant differences exist among free energy approximations obtained with 4 publicly available and widely used programs, when benchmarked against a collection of 695 pairs of short oligos collected and curated by the authors of this work based on 29 publications.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Research Council of Canada, Institute of Information Technology, 100 des Aboiteaux Street, Suite 1100, Moncton, NB E1A7R1, Canada. dan.tulpan@nrc-cnrc.gc.ca

ABSTRACT

Background: Estimation of DNA duplex hybridization free energy is widely used for predicting cross-hybridizations in DNA computing and microarray experiments. A number of software programs based on different methods and parametrizations are available for the theoretical estimation of duplex free energies. However, significant differences in free energy values are sometimes observed among estimations obtained with various methods, thus being difficult to decide what value is the accurate one.

Results: We present in this study a quantitative comparison of the similarities and differences among four published DNA/DNA duplex free energy calculation methods and an extended Nearest-Neighbour Model for perfect matches based on triplet interactions. The comparison was performed on a benchmark data set with 695 pairs of short oligos that we collected and manually curated from 29 publications. Sequence lengths range from 4 to 30 nucleotides and span a large GC-content percentage range. For perfect matches, we propose an extension of the Nearest-Neighbour Model that matches or exceeds the performance of the existing ones, both in terms of correlations and root mean squared errors. The proposed model was trained on experimental data with temperature, sodium and sequence concentration characteristics that span a wide range of values, thus conferring the model a higher power of generalization when used for free energy estimations of DNA duplexes under non-standard experimental conditions.

Conclusions: Based on our preliminary results, we conclude that no statistically significant differences exist among free energy approximations obtained with 4 publicly available and widely used programs, when benchmarked against a collection of 695 pairs of short oligos collected and curated by the authors of this work based on 29 publications. The extended Nearest-Neighbour Model based on triplet interactions presented in this work is capable of performing accurate estimations of free energies for perfect match duplexes under both standard and non-standard experimental conditions and may serve as a baseline for further developments in this area of research.

Show MeSH
Related in: MedlinePlus