Limits...
Free energy estimation of short DNA duplex hybridizations.

Tulpan D, Andronescu M, Leger S - BMC Bioinformatics (2010)

Bottom Line: Sequence lengths range from 4 to 30 nucleotides and span a large GC-content percentage range.For perfect matches, we propose an extension of the Nearest-Neighbour Model that matches or exceeds the performance of the existing ones, both in terms of correlations and root mean squared errors.Based on our preliminary results, we conclude that no statistically significant differences exist among free energy approximations obtained with 4 publicly available and widely used programs, when benchmarked against a collection of 695 pairs of short oligos collected and curated by the authors of this work based on 29 publications.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Research Council of Canada, Institute of Information Technology, 100 des Aboiteaux Street, Suite 1100, Moncton, NB E1A7R1, Canada. dan.tulpan@nrc-cnrc.gc.ca

ABSTRACT

Background: Estimation of DNA duplex hybridization free energy is widely used for predicting cross-hybridizations in DNA computing and microarray experiments. A number of software programs based on different methods and parametrizations are available for the theoretical estimation of duplex free energies. However, significant differences in free energy values are sometimes observed among estimations obtained with various methods, thus being difficult to decide what value is the accurate one.

Results: We present in this study a quantitative comparison of the similarities and differences among four published DNA/DNA duplex free energy calculation methods and an extended Nearest-Neighbour Model for perfect matches based on triplet interactions. The comparison was performed on a benchmark data set with 695 pairs of short oligos that we collected and manually curated from 29 publications. Sequence lengths range from 4 to 30 nucleotides and span a large GC-content percentage range. For perfect matches, we propose an extension of the Nearest-Neighbour Model that matches or exceeds the performance of the existing ones, both in terms of correlations and root mean squared errors. The proposed model was trained on experimental data with temperature, sodium and sequence concentration characteristics that span a wide range of values, thus conferring the model a higher power of generalization when used for free energy estimations of DNA duplexes under non-standard experimental conditions.

Conclusions: Based on our preliminary results, we conclude that no statistically significant differences exist among free energy approximations obtained with 4 publicly available and widely used programs, when benchmarked against a collection of 695 pairs of short oligos collected and curated by the authors of this work based on 29 publications. The extended Nearest-Neighbour Model based on triplet interactions presented in this work is capable of performing accurate estimations of free energies for perfect match duplexes under both standard and non-standard experimental conditions and may serve as a baseline for further developments in this area of research.

Show MeSH

Related in: MedlinePlus

Variation of doublet NN values for 9 sets of parameters. Free energy values corresponding to nine sets (our set and 8 others) of thermodynamic nearest-neighbour doublet parameters at 37°C are displayed in this plot. Four (Gotoh, Vologodskii, Blake and Benight) out of the eight publicly available sets of doublet parameters correspond to models that do not account for initiation penalties for duplex formations [18], and the sodium concentration for their experiments was between 0.0195 M and 0.195 M. For the other 4 sets (Breslauer, SantaLucia, Sugimoto and Unified) the sodium concentration equals 1 M.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2837027&req=5

Figure 12: Variation of doublet NN values for 9 sets of parameters. Free energy values corresponding to nine sets (our set and 8 others) of thermodynamic nearest-neighbour doublet parameters at 37°C are displayed in this plot. Four (Gotoh, Vologodskii, Blake and Benight) out of the eight publicly available sets of doublet parameters correspond to models that do not account for initiation penalties for duplex formations [18], and the sodium concentration for their experiments was between 0.0195 M and 0.195 M. For the other 4 sets (Breslauer, SantaLucia, Sugimoto and Unified) the sodium concentration equals 1 M.

Mentions: Table 2 presents the estimated free energy parameters for DNA doublets measured at 37°C. The set of 10 parameters corresponds to the best set obtained with the procedure explained in Table 6. We compared our set of NN free energy parameters at 37°C with eight other sets of parameters reported by SantaLucia [18], namely the sets obtained by Gotoh [24], Vologodskii [25], Breslauer [26], Blake [27], Benight [28], SantaLucia [29], Sugimoto [30] and the Unified set [31]. Our set of NN thermodynamic doublet parameters summarized in Figure 12 differs from the unified parameters by less than 0.5 kcal/mol in 8 out of 10 cases. We also notice that our NN set follows in general the reported qualitative trend in order of decreasing stability: GC/CG = CG/GC > GG/CC > CA/GT = GT/CA = GA/CT = CT/GA > AA/TT > AT/TA > TA/AT with one exception, namely GG/CC has a higher weight than GC/CG and CG/GC, an effect that could be caused by the low representation of the GG/CC doublets in the training set and by the absence of duplex initiation parameters in our model.


Free energy estimation of short DNA duplex hybridizations.

Tulpan D, Andronescu M, Leger S - BMC Bioinformatics (2010)

Variation of doublet NN values for 9 sets of parameters. Free energy values corresponding to nine sets (our set and 8 others) of thermodynamic nearest-neighbour doublet parameters at 37°C are displayed in this plot. Four (Gotoh, Vologodskii, Blake and Benight) out of the eight publicly available sets of doublet parameters correspond to models that do not account for initiation penalties for duplex formations [18], and the sodium concentration for their experiments was between 0.0195 M and 0.195 M. For the other 4 sets (Breslauer, SantaLucia, Sugimoto and Unified) the sodium concentration equals 1 M.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2837027&req=5

Figure 12: Variation of doublet NN values for 9 sets of parameters. Free energy values corresponding to nine sets (our set and 8 others) of thermodynamic nearest-neighbour doublet parameters at 37°C are displayed in this plot. Four (Gotoh, Vologodskii, Blake and Benight) out of the eight publicly available sets of doublet parameters correspond to models that do not account for initiation penalties for duplex formations [18], and the sodium concentration for their experiments was between 0.0195 M and 0.195 M. For the other 4 sets (Breslauer, SantaLucia, Sugimoto and Unified) the sodium concentration equals 1 M.
Mentions: Table 2 presents the estimated free energy parameters for DNA doublets measured at 37°C. The set of 10 parameters corresponds to the best set obtained with the procedure explained in Table 6. We compared our set of NN free energy parameters at 37°C with eight other sets of parameters reported by SantaLucia [18], namely the sets obtained by Gotoh [24], Vologodskii [25], Breslauer [26], Blake [27], Benight [28], SantaLucia [29], Sugimoto [30] and the Unified set [31]. Our set of NN thermodynamic doublet parameters summarized in Figure 12 differs from the unified parameters by less than 0.5 kcal/mol in 8 out of 10 cases. We also notice that our NN set follows in general the reported qualitative trend in order of decreasing stability: GC/CG = CG/GC > GG/CC > CA/GT = GT/CA = GA/CT = CT/GA > AA/TT > AT/TA > TA/AT with one exception, namely GG/CC has a higher weight than GC/CG and CG/GC, an effect that could be caused by the low representation of the GG/CC doublets in the training set and by the absence of duplex initiation parameters in our model.

Bottom Line: Sequence lengths range from 4 to 30 nucleotides and span a large GC-content percentage range.For perfect matches, we propose an extension of the Nearest-Neighbour Model that matches or exceeds the performance of the existing ones, both in terms of correlations and root mean squared errors.Based on our preliminary results, we conclude that no statistically significant differences exist among free energy approximations obtained with 4 publicly available and widely used programs, when benchmarked against a collection of 695 pairs of short oligos collected and curated by the authors of this work based on 29 publications.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Research Council of Canada, Institute of Information Technology, 100 des Aboiteaux Street, Suite 1100, Moncton, NB E1A7R1, Canada. dan.tulpan@nrc-cnrc.gc.ca

ABSTRACT

Background: Estimation of DNA duplex hybridization free energy is widely used for predicting cross-hybridizations in DNA computing and microarray experiments. A number of software programs based on different methods and parametrizations are available for the theoretical estimation of duplex free energies. However, significant differences in free energy values are sometimes observed among estimations obtained with various methods, thus being difficult to decide what value is the accurate one.

Results: We present in this study a quantitative comparison of the similarities and differences among four published DNA/DNA duplex free energy calculation methods and an extended Nearest-Neighbour Model for perfect matches based on triplet interactions. The comparison was performed on a benchmark data set with 695 pairs of short oligos that we collected and manually curated from 29 publications. Sequence lengths range from 4 to 30 nucleotides and span a large GC-content percentage range. For perfect matches, we propose an extension of the Nearest-Neighbour Model that matches or exceeds the performance of the existing ones, both in terms of correlations and root mean squared errors. The proposed model was trained on experimental data with temperature, sodium and sequence concentration characteristics that span a wide range of values, thus conferring the model a higher power of generalization when used for free energy estimations of DNA duplexes under non-standard experimental conditions.

Conclusions: Based on our preliminary results, we conclude that no statistically significant differences exist among free energy approximations obtained with 4 publicly available and widely used programs, when benchmarked against a collection of 695 pairs of short oligos collected and curated by the authors of this work based on 29 publications. The extended Nearest-Neighbour Model based on triplet interactions presented in this work is capable of performing accurate estimations of free energies for perfect match duplexes under both standard and non-standard experimental conditions and may serve as a baseline for further developments in this area of research.

Show MeSH
Related in: MedlinePlus