Limits...
Localizing triplet periodicity in DNA and cDNA sequences.

Wang L, Stein LD - BMC Bioinformatics (2010)

Bottom Line: Using both simulated TP signals and the real C. elegans sequence F56F11 as an example, we demonstrate that, (1) Modified Wavelet Transform (MWT) can better define the boundary of TP region than the conventional Short Time Fourier Transform (STFT); (2) The scale parameter (a) of MWT determines the precision of TP boundary localization: bigger values of a give sharper TP boundaries but result in a lower signal to noise ratio; (3) RNA splicing sites have weaker TP signals than coding region; (4) TP signals in coding region can be destroyed or recovered by frame-shift mutations; (5) 6 bp periodicities in introns and intergenic region can generate false positive signals and it can be removed with 6 bp MWT.Subtraction of 6 bp periodicity signals reduces the number of false positives.More importantly, TP signal has the potential to be used to detect the splice junctions in fully spliced mRNA sequence.

View Article: PubMed Central - HTML - PubMed

Affiliation: Cold Spring Harbor Laboratory, Williams #5, Cold Spring Harbor, NY 11724, USA. wangli@cshl.edu

ABSTRACT

Background: The protein-coding regions (coding exons) of a DNA sequence exhibit a triplet periodicity (TP) due to fact that coding exons contain a series of three nucleotide codons that encode specific amino acid residues. Such periodicity is usually not observed in introns and intergenic regions. If a DNA sequence is divided into small segments and a Fourier Transform is applied on each segment, a strong peak at frequency 1/3 is typically observed in the Fourier spectrum of coding segments, but not in non-coding regions. This property has been used in identifying the locations of protein-coding genes in unannotated sequence. The method is fast and requires no training. However, the need to compute the Fourier Transform across a segment (window) of arbitrary size affects the accuracy with which one can localize TP boundaries. Here, we report a technique that provides higher-resolution identification of these boundaries, and use the technique to explore the biological correlates of TP regions in the genome of the model organism C. elegans.

Results: Using both simulated TP signals and the real C. elegans sequence F56F11 as an example, we demonstrate that, (1) Modified Wavelet Transform (MWT) can better define the boundary of TP region than the conventional Short Time Fourier Transform (STFT); (2) The scale parameter (a) of MWT determines the precision of TP boundary localization: bigger values of a give sharper TP boundaries but result in a lower signal to noise ratio; (3) RNA splicing sites have weaker TP signals than coding region; (4) TP signals in coding region can be destroyed or recovered by frame-shift mutations; (5) 6 bp periodicities in introns and intergenic region can generate false positive signals and it can be removed with 6 bp MWT.

Conclusions: MWT can provide more precise TP boundaries than STFT and the boundaries can be further refined by bigger scale MWT. Subtraction of 6 bp periodicity signals reduces the number of false positives. Experimentally-introduced frame-shift mutations help recover TP signal that have been lost by possible ancient frame-shifts. More importantly, TP signal has the potential to be used to detect the splice junctions in fully spliced mRNA sequence.

Show MeSH
PSD plot of sequence F56F11.4 without introns. PSD plot at two different scales, 5 and 1.25 for MWT. The line segments on the bottom show the splice junctions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2992068&req=5

Figure 3: PSD plot of sequence F56F11.4 without introns. PSD plot at two different scales, 5 and 1.25 for MWT. The line segments on the bottom show the splice junctions.

Mentions: After removing the introns of gene F56F11.4, we merged the exons and plotted PSD under two different choices of scale parameter (Figure 3). The plots show a dramatic increase in PSD at the transition between non-coding and coding sequence. At a scale of 1.25, the PSD plot can clearly distinguish the 5' and 3' UTRs from the coding region. However, under scale of 5, the PSD plot gives a better indication of TP boundary on both sides for dividing 5' UTR and first exon, or last exon and 3' UTR. In practive, larger values of the scale parameter have a higher resolution for revealing details hidden within the broad PSD peak obtained under smaller scales. In Figure 3, the horizontal line represents the coding region with vertical lines marking the boundaries of individual exons.


Localizing triplet periodicity in DNA and cDNA sequences.

Wang L, Stein LD - BMC Bioinformatics (2010)

PSD plot of sequence F56F11.4 without introns. PSD plot at two different scales, 5 and 1.25 for MWT. The line segments on the bottom show the splice junctions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2992068&req=5

Figure 3: PSD plot of sequence F56F11.4 without introns. PSD plot at two different scales, 5 and 1.25 for MWT. The line segments on the bottom show the splice junctions.
Mentions: After removing the introns of gene F56F11.4, we merged the exons and plotted PSD under two different choices of scale parameter (Figure 3). The plots show a dramatic increase in PSD at the transition between non-coding and coding sequence. At a scale of 1.25, the PSD plot can clearly distinguish the 5' and 3' UTRs from the coding region. However, under scale of 5, the PSD plot gives a better indication of TP boundary on both sides for dividing 5' UTR and first exon, or last exon and 3' UTR. In practive, larger values of the scale parameter have a higher resolution for revealing details hidden within the broad PSD peak obtained under smaller scales. In Figure 3, the horizontal line represents the coding region with vertical lines marking the boundaries of individual exons.

Bottom Line: Using both simulated TP signals and the real C. elegans sequence F56F11 as an example, we demonstrate that, (1) Modified Wavelet Transform (MWT) can better define the boundary of TP region than the conventional Short Time Fourier Transform (STFT); (2) The scale parameter (a) of MWT determines the precision of TP boundary localization: bigger values of a give sharper TP boundaries but result in a lower signal to noise ratio; (3) RNA splicing sites have weaker TP signals than coding region; (4) TP signals in coding region can be destroyed or recovered by frame-shift mutations; (5) 6 bp periodicities in introns and intergenic region can generate false positive signals and it can be removed with 6 bp MWT.Subtraction of 6 bp periodicity signals reduces the number of false positives.More importantly, TP signal has the potential to be used to detect the splice junctions in fully spliced mRNA sequence.

View Article: PubMed Central - HTML - PubMed

Affiliation: Cold Spring Harbor Laboratory, Williams #5, Cold Spring Harbor, NY 11724, USA. wangli@cshl.edu

ABSTRACT

Background: The protein-coding regions (coding exons) of a DNA sequence exhibit a triplet periodicity (TP) due to fact that coding exons contain a series of three nucleotide codons that encode specific amino acid residues. Such periodicity is usually not observed in introns and intergenic regions. If a DNA sequence is divided into small segments and a Fourier Transform is applied on each segment, a strong peak at frequency 1/3 is typically observed in the Fourier spectrum of coding segments, but not in non-coding regions. This property has been used in identifying the locations of protein-coding genes in unannotated sequence. The method is fast and requires no training. However, the need to compute the Fourier Transform across a segment (window) of arbitrary size affects the accuracy with which one can localize TP boundaries. Here, we report a technique that provides higher-resolution identification of these boundaries, and use the technique to explore the biological correlates of TP regions in the genome of the model organism C. elegans.

Results: Using both simulated TP signals and the real C. elegans sequence F56F11 as an example, we demonstrate that, (1) Modified Wavelet Transform (MWT) can better define the boundary of TP region than the conventional Short Time Fourier Transform (STFT); (2) The scale parameter (a) of MWT determines the precision of TP boundary localization: bigger values of a give sharper TP boundaries but result in a lower signal to noise ratio; (3) RNA splicing sites have weaker TP signals than coding region; (4) TP signals in coding region can be destroyed or recovered by frame-shift mutations; (5) 6 bp periodicities in introns and intergenic region can generate false positive signals and it can be removed with 6 bp MWT.

Conclusions: MWT can provide more precise TP boundaries than STFT and the boundaries can be further refined by bigger scale MWT. Subtraction of 6 bp periodicity signals reduces the number of false positives. Experimentally-introduced frame-shift mutations help recover TP signal that have been lost by possible ancient frame-shifts. More importantly, TP signal has the potential to be used to detect the splice junctions in fully spliced mRNA sequence.

Show MeSH