Limits...
Identification of sequence motifs significantly associated with antisense activity.

McQuisten KA, Peek AS - BMC Bioinformatics (2007)

Bottom Line: Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model.Also, many significant motifs existed as subwords of other significant motifs.Support vector regression experiments indicated that the feature set of significant motifs increased correlation compared to all possible motifs as well as several subsets of the significant motifs.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Bioinformatics, Integrated DNA Technologies, Coralville, IA 52241, USA. kmcquisten@idtdna.com

ABSTRACT

Background: Predicting the suppression activity of antisense oligonucleotide sequences is the main goal of the rational design of nucleic acids. To create an effective predictive model, it is important to know what properties of an oligonucleotide sequence associate significantly with antisense activity. Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model. This paper will discuss the results of a randomization procedure to find motifs that associate significantly with either high or low antisense suppression activity, analysis of their properties, as well as the results of support vector machine modelling using these significant motifs as features.

Results: We discovered 155 motifs that associate significantly with high antisense suppression activity and 202 motifs that associate significantly with low suppression activity. The motifs range in length from 2 to 5 bases, contain several motifs that have been previously discovered as associating highly with antisense activity, and have thermodynamic properties consistent with previous work associating thermodynamic properties of sequences with their antisense activity. Statistical analysis revealed no correlation between a motif's position within an antisense sequence and that sequences antisense activity. Also, many significant motifs existed as subwords of other significant motifs. Support vector regression experiments indicated that the feature set of significant motifs increased correlation compared to all possible motifs as well as several subsets of the significant motifs.

Conclusion: The thermodynamic properties of the significantly associated motifs support existing data correlating the thermodynamic properties of the antisense oligonucleotide with antisense efficiency, reinforcing our hypothesis that antisense suppression is strongly associated with probe/target thermodynamics, as there are no enzymatic mediators to speed the process along like the RNA Induced Silencing Complex (RISC) in RNAi. The independence of motif position and antisense activity also allows us to bypass consideration of this feature in the modelling process, promoting model efficiency and reducing the chance of overfitting when predicting antisense activity. The increase in SVR correlation with significant features compared to nearest-neighbour features indicates that thermodynamics alone is likely not the only factor in determining antisense efficiency.

Show MeSH

Related in: MedlinePlus

Thermodynamic Distributions of (top) Significant Motifs and (bottom) Submotif-unique Significant Motifs. The distributions of Gibbs free energy values (dG) for sequences associated with effective antisense activity (green) and those associate with ineffective antisense activity (red). The difference in average dG between "good" and "bad" motifs in the subword-unique motifs (-3.45 vs. -1.66 kcal/mol) is greater than the difference in means in the entire population of significant motifs (-3.59 vs. -2.105 kcal/mol). This can be attributed to the removal of motifs from each population that contain submotifs in the opposing group.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1919396&req=5

Figure 2: Thermodynamic Distributions of (top) Significant Motifs and (bottom) Submotif-unique Significant Motifs. The distributions of Gibbs free energy values (dG) for sequences associated with effective antisense activity (green) and those associate with ineffective antisense activity (red). The difference in average dG between "good" and "bad" motifs in the subword-unique motifs (-3.45 vs. -1.66 kcal/mol) is greater than the difference in means in the entire population of significant motifs (-3.59 vs. -2.105 kcal/mol). This can be attributed to the removal of motifs from each population that contain submotifs in the opposing group.

Mentions: After applying our randomization method to the dataset, we found 155 motifs that associated significantly with effective antisense suppression and 202 motifs that were significantly associated with antisense activity, which are presented in Tables 1 and 2. One of the most striking differences between these two sets of motifs is in their base compositions. Motifs that associate with effective antisense suppression are composed of nearly half G, with the remaining percentage split relatively evenly between the remaining bases (G: 48.9%, A: 15.1%, C: 17.3%, T: 18.7%). Those motifs significantly associated with poor antisense activity are quite different, composed of almost no G, but rather dominated by A's and C's (G: 9.5%, A: 39.2%, C: 31.0%, T: 20.3%). This difference in base composition contributes to a marked difference in the thermodynamic properties of these sequences. The average value of dG was significantly more negative (t-test, t = -9.369, p = 1.090e-18) for those motifs associated with effective antisense (μ = -3.593 kcal/mol, σ = 1.465 kcal/mol) than for those associated with poor activity (μ = -2.105 kcal/mol, σ = 1.517 kcal/mol). We also find that the average dG of "good" motifs is significantly more negative (t-test, t = -3.460, p = 6.682e-4) and the average dG of the "bad" motifs significantly less negative (t-test, t = 9.374, p = 4.047e-18) than the average dG of the entire population of possible sequence motifs (μ = -3.166 kcal/mol, σ = 1.387 kcal/mol). The thermodynamic distributions of the "good" and the "bad" motifs are given in the top graph of Figure 2.


Identification of sequence motifs significantly associated with antisense activity.

McQuisten KA, Peek AS - BMC Bioinformatics (2007)

Thermodynamic Distributions of (top) Significant Motifs and (bottom) Submotif-unique Significant Motifs. The distributions of Gibbs free energy values (dG) for sequences associated with effective antisense activity (green) and those associate with ineffective antisense activity (red). The difference in average dG between "good" and "bad" motifs in the subword-unique motifs (-3.45 vs. -1.66 kcal/mol) is greater than the difference in means in the entire population of significant motifs (-3.59 vs. -2.105 kcal/mol). This can be attributed to the removal of motifs from each population that contain submotifs in the opposing group.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1919396&req=5

Figure 2: Thermodynamic Distributions of (top) Significant Motifs and (bottom) Submotif-unique Significant Motifs. The distributions of Gibbs free energy values (dG) for sequences associated with effective antisense activity (green) and those associate with ineffective antisense activity (red). The difference in average dG between "good" and "bad" motifs in the subword-unique motifs (-3.45 vs. -1.66 kcal/mol) is greater than the difference in means in the entire population of significant motifs (-3.59 vs. -2.105 kcal/mol). This can be attributed to the removal of motifs from each population that contain submotifs in the opposing group.
Mentions: After applying our randomization method to the dataset, we found 155 motifs that associated significantly with effective antisense suppression and 202 motifs that were significantly associated with antisense activity, which are presented in Tables 1 and 2. One of the most striking differences between these two sets of motifs is in their base compositions. Motifs that associate with effective antisense suppression are composed of nearly half G, with the remaining percentage split relatively evenly between the remaining bases (G: 48.9%, A: 15.1%, C: 17.3%, T: 18.7%). Those motifs significantly associated with poor antisense activity are quite different, composed of almost no G, but rather dominated by A's and C's (G: 9.5%, A: 39.2%, C: 31.0%, T: 20.3%). This difference in base composition contributes to a marked difference in the thermodynamic properties of these sequences. The average value of dG was significantly more negative (t-test, t = -9.369, p = 1.090e-18) for those motifs associated with effective antisense (μ = -3.593 kcal/mol, σ = 1.465 kcal/mol) than for those associated with poor activity (μ = -2.105 kcal/mol, σ = 1.517 kcal/mol). We also find that the average dG of "good" motifs is significantly more negative (t-test, t = -3.460, p = 6.682e-4) and the average dG of the "bad" motifs significantly less negative (t-test, t = 9.374, p = 4.047e-18) than the average dG of the entire population of possible sequence motifs (μ = -3.166 kcal/mol, σ = 1.387 kcal/mol). The thermodynamic distributions of the "good" and the "bad" motifs are given in the top graph of Figure 2.

Bottom Line: Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model.Also, many significant motifs existed as subwords of other significant motifs.Support vector regression experiments indicated that the feature set of significant motifs increased correlation compared to all possible motifs as well as several subsets of the significant motifs.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Bioinformatics, Integrated DNA Technologies, Coralville, IA 52241, USA. kmcquisten@idtdna.com

ABSTRACT

Background: Predicting the suppression activity of antisense oligonucleotide sequences is the main goal of the rational design of nucleic acids. To create an effective predictive model, it is important to know what properties of an oligonucleotide sequence associate significantly with antisense activity. Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model. This paper will discuss the results of a randomization procedure to find motifs that associate significantly with either high or low antisense suppression activity, analysis of their properties, as well as the results of support vector machine modelling using these significant motifs as features.

Results: We discovered 155 motifs that associate significantly with high antisense suppression activity and 202 motifs that associate significantly with low suppression activity. The motifs range in length from 2 to 5 bases, contain several motifs that have been previously discovered as associating highly with antisense activity, and have thermodynamic properties consistent with previous work associating thermodynamic properties of sequences with their antisense activity. Statistical analysis revealed no correlation between a motif's position within an antisense sequence and that sequences antisense activity. Also, many significant motifs existed as subwords of other significant motifs. Support vector regression experiments indicated that the feature set of significant motifs increased correlation compared to all possible motifs as well as several subsets of the significant motifs.

Conclusion: The thermodynamic properties of the significantly associated motifs support existing data correlating the thermodynamic properties of the antisense oligonucleotide with antisense efficiency, reinforcing our hypothesis that antisense suppression is strongly associated with probe/target thermodynamics, as there are no enzymatic mediators to speed the process along like the RNA Induced Silencing Complex (RISC) in RNAi. The independence of motif position and antisense activity also allows us to bypass consideration of this feature in the modelling process, promoting model efficiency and reducing the chance of overfitting when predicting antisense activity. The increase in SVR correlation with significant features compared to nearest-neighbour features indicates that thermodynamics alone is likely not the only factor in determining antisense efficiency.

Show MeSH
Related in: MedlinePlus