Limits...
Identification of exonic regions in DNA sequences using cross-correlation and noise suppression by discrete wavelet transform.

Abbasi O, Rostami A, Karimian G - BMC Bioinformatics (2011)

Bottom Line: The method reduces the dependency of window length on identification accuracy.The proposed method increased the accuracy of exon detection by 4% to 41% relative to the most common digital signal processing methods for exon prediction.In addition, discrete wavelet transform (DWT) can minimise noise while maintaining the signal.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Engineering-Emerging Technologies, University of Tabriz, Tabriz 5166614761, Iran.

ABSTRACT

Background: The identification of protein coding regions (exons) in DNA sequences using signal processing techniques is an important component of bioinformatics and biological signal processing. In this paper, a new method is presented for the identification of exonic regions in DNA sequences. This method is based on the cross-correlation technique that can identify periodic regions in DNA sequences.

Results: The method reduces the dependency of window length on identification accuracy. The proposed algorithm is applied to different eukaryotic datasets and the output results are compared with those of other established methods. The proposed method increased the accuracy of exon detection by 4% to 41% relative to the most common digital signal processing methods for exon prediction.

Conclusions: We demonstrated that periodic signals can be estimated using cross-correlation. In addition, discrete wavelet transform (DWT) can minimise noise while maintaining the signal. The proposed algorithm, which combines cross-correlation and DWT, significantly increases the accuracy of exonic region identification.

Show MeSH
The identification of exonic regions on the gene sequence F56F11.4. The results of exonic region identification on the sequence F56F11.4 (8,000 bp) are plotted for different methods. (a) Cross-correlation (proposed), (b) AN filter, (c) TDP and (d) DFT methods. The shadowed regions are exonic regions that must be identified.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3306003&req=5

Figure 5: The identification of exonic regions on the gene sequence F56F11.4. The results of exonic region identification on the sequence F56F11.4 (8,000 bp) are plotted for different methods. (a) Cross-correlation (proposed), (b) AN filter, (c) TDP and (d) DFT methods. The shadowed regions are exonic regions that must be identified.

Mentions: The 1000 multi exon genes from chromosome III of C. elegans provide data for training. The calculated threshold level is 61. This threshold level was applied to the F56F11.4 gene in chromosome III of C. elegans as shown in Figure 5.a. Clearly, at this threshold, all five regions are correctly identified as coding regions. However, there also exist small non-coding regions around position 2000 which are misidentified as coding regions. Since the characteristics of the DNA sequence can change significantly at different positions, even within the same dataset, a static threshold may yield incorrect identifications at some positions. Therefore, an adaptive threshold selection algorithm such as that described in [22] is required for exon prediction. In Tables (1), (2), (3), and (4) our proposed algorithm is compared with other algorithms over a range of thresholds.


Identification of exonic regions in DNA sequences using cross-correlation and noise suppression by discrete wavelet transform.

Abbasi O, Rostami A, Karimian G - BMC Bioinformatics (2011)

The identification of exonic regions on the gene sequence F56F11.4. The results of exonic region identification on the sequence F56F11.4 (8,000 bp) are plotted for different methods. (a) Cross-correlation (proposed), (b) AN filter, (c) TDP and (d) DFT methods. The shadowed regions are exonic regions that must be identified.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3306003&req=5

Figure 5: The identification of exonic regions on the gene sequence F56F11.4. The results of exonic region identification on the sequence F56F11.4 (8,000 bp) are plotted for different methods. (a) Cross-correlation (proposed), (b) AN filter, (c) TDP and (d) DFT methods. The shadowed regions are exonic regions that must be identified.
Mentions: The 1000 multi exon genes from chromosome III of C. elegans provide data for training. The calculated threshold level is 61. This threshold level was applied to the F56F11.4 gene in chromosome III of C. elegans as shown in Figure 5.a. Clearly, at this threshold, all five regions are correctly identified as coding regions. However, there also exist small non-coding regions around position 2000 which are misidentified as coding regions. Since the characteristics of the DNA sequence can change significantly at different positions, even within the same dataset, a static threshold may yield incorrect identifications at some positions. Therefore, an adaptive threshold selection algorithm such as that described in [22] is required for exon prediction. In Tables (1), (2), (3), and (4) our proposed algorithm is compared with other algorithms over a range of thresholds.

Bottom Line: The method reduces the dependency of window length on identification accuracy.The proposed method increased the accuracy of exon detection by 4% to 41% relative to the most common digital signal processing methods for exon prediction.In addition, discrete wavelet transform (DWT) can minimise noise while maintaining the signal.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Engineering-Emerging Technologies, University of Tabriz, Tabriz 5166614761, Iran.

ABSTRACT

Background: The identification of protein coding regions (exons) in DNA sequences using signal processing techniques is an important component of bioinformatics and biological signal processing. In this paper, a new method is presented for the identification of exonic regions in DNA sequences. This method is based on the cross-correlation technique that can identify periodic regions in DNA sequences.

Results: The method reduces the dependency of window length on identification accuracy. The proposed algorithm is applied to different eukaryotic datasets and the output results are compared with those of other established methods. The proposed method increased the accuracy of exon detection by 4% to 41% relative to the most common digital signal processing methods for exon prediction.

Conclusions: We demonstrated that periodic signals can be estimated using cross-correlation. In addition, discrete wavelet transform (DWT) can minimise noise while maintaining the signal. The proposed algorithm, which combines cross-correlation and DWT, significantly increases the accuracy of exonic region identification.

Show MeSH