Limits...
Classification of genomic signals using dynamic time warping.

Skutkova H, Vitek M, Babula P, Kizek R, Provaznik I - BMC Bioinformatics (2013)

Bottom Line: The resulting tree was compared with a classical phylogenetic tree reconstructed using multiple alignment.The classification of genomic signals using the DTW is evolutionary closer to phylogeny of organisms.Classification of genomic signals using dynamic time warping is an adequate variant to phylogenetic analysis using the symbolic DNA sequences alignment; in addition, it is robust, quick and more precise technique.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Classification methods of DNA most commonly use comparison of the differences in DNA symbolic records, which requires the global multiple sequence alignment. This solution is often inappropriate, causing a number of imprecisions and requires additional user intervention for exact alignment of the similar segments. The similar segments in DNA represented as a signal are characterized by a similar shape of the curve. The DNA alignment in genomic signals may adjust whole sections not only individual symbols. The dynamic time warping (DTW) is suitable for this purpose and can replace the multiple alignment of symbolic sequences in applications, such as phylogenetic analysis.

Methods: The proposed method is composed of three main parts. The first part represent conversion of symbolic representation of DNA sequences in the form of a string of A,C,G,T symbols to signal representation in the form of cumulated phase of complex components defined for each symbol. Next part represents signals size adjustment realized by standard signal preprocessing methods: median filtration, detrendization and resampling. The final part necessary for genomic signals comparison is position and length alignment of genomic signals by dynamic time warping (DTW).

Results: The application of the DTW on set of genomic signals was evaluated in dendrogram construction using cluster analysis. The resulting tree was compared with a classical phylogenetic tree reconstructed using multiple alignment. The classification of genomic signals using the DTW is evolutionary closer to phylogeny of organisms. This method is more resistant to errors in the sequences and less dependent on the number of input sequences.

Conclusions: Classification of genomic signals using dynamic time warping is an adequate variant to phylogenetic analysis using the symbolic DNA sequences alignment; in addition, it is robust, quick and more precise technique.

Show MeSH

Related in: MedlinePlus

The influence of downsampling factor of genomic signals. a) The dependence of change of pair distance on downsampling; b) The dependence of DTW processing time on downsampling.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3750471&req=5

Figure 4: The influence of downsampling factor of genomic signals. a) The dependence of change of pair distance on downsampling; b) The dependence of DTW processing time on downsampling.

Mentions: The utilisation of the proposed method for very different sequences (different genes with different lengths) could cause inappropriate setting of preprocessing parameters and thus errors in similarity evaluation. This problem is similar to setting of parameters of global multiple sequence alignment as scoring matrix or gaps penalties. Low computational load is the greatest advantage of the similarity analysis of DNA represented by genomic signals. Moreover, the signal processing time decreases exponentially with increasing downsampling factor; this fact is presented in Figure 4b. The graph was evaluated on the basis of processing of 10 sequences. The elapsed time for global multiple alignment of this dataset was 49 seconds on standard PC without parallel processing. The time for evaluation of the DTW of 10 genomic signals downsampled by factor 10 decreased to 2.1 second. The signal still contains more than 99.5 % of the useful information after downsampling by ratio 10. In addition, the Figure 4a shows that the increasing downsampling factor has no significant effect on distance differences. The values of distance difference were calculated as the percentage value of sum square differences between distance table calculated for signals with and without downsampling. The trend of dependence between the downsampling factor and the distance differences is almost linear to the value of 10 of downsampling factor (details in Figure 4a), and then changes very slowly increase up to the value of 70. Above the value of 70 of the downsampling factor, the percentage distance differences begin to increase sharply, but all these changes do not exceed 5 percent.


Classification of genomic signals using dynamic time warping.

Skutkova H, Vitek M, Babula P, Kizek R, Provaznik I - BMC Bioinformatics (2013)

The influence of downsampling factor of genomic signals. a) The dependence of change of pair distance on downsampling; b) The dependence of DTW processing time on downsampling.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3750471&req=5

Figure 4: The influence of downsampling factor of genomic signals. a) The dependence of change of pair distance on downsampling; b) The dependence of DTW processing time on downsampling.
Mentions: The utilisation of the proposed method for very different sequences (different genes with different lengths) could cause inappropriate setting of preprocessing parameters and thus errors in similarity evaluation. This problem is similar to setting of parameters of global multiple sequence alignment as scoring matrix or gaps penalties. Low computational load is the greatest advantage of the similarity analysis of DNA represented by genomic signals. Moreover, the signal processing time decreases exponentially with increasing downsampling factor; this fact is presented in Figure 4b. The graph was evaluated on the basis of processing of 10 sequences. The elapsed time for global multiple alignment of this dataset was 49 seconds on standard PC without parallel processing. The time for evaluation of the DTW of 10 genomic signals downsampled by factor 10 decreased to 2.1 second. The signal still contains more than 99.5 % of the useful information after downsampling by ratio 10. In addition, the Figure 4a shows that the increasing downsampling factor has no significant effect on distance differences. The values of distance difference were calculated as the percentage value of sum square differences between distance table calculated for signals with and without downsampling. The trend of dependence between the downsampling factor and the distance differences is almost linear to the value of 10 of downsampling factor (details in Figure 4a), and then changes very slowly increase up to the value of 70. Above the value of 70 of the downsampling factor, the percentage distance differences begin to increase sharply, but all these changes do not exceed 5 percent.

Bottom Line: The resulting tree was compared with a classical phylogenetic tree reconstructed using multiple alignment.The classification of genomic signals using the DTW is evolutionary closer to phylogeny of organisms.Classification of genomic signals using dynamic time warping is an adequate variant to phylogenetic analysis using the symbolic DNA sequences alignment; in addition, it is robust, quick and more precise technique.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Classification methods of DNA most commonly use comparison of the differences in DNA symbolic records, which requires the global multiple sequence alignment. This solution is often inappropriate, causing a number of imprecisions and requires additional user intervention for exact alignment of the similar segments. The similar segments in DNA represented as a signal are characterized by a similar shape of the curve. The DNA alignment in genomic signals may adjust whole sections not only individual symbols. The dynamic time warping (DTW) is suitable for this purpose and can replace the multiple alignment of symbolic sequences in applications, such as phylogenetic analysis.

Methods: The proposed method is composed of three main parts. The first part represent conversion of symbolic representation of DNA sequences in the form of a string of A,C,G,T symbols to signal representation in the form of cumulated phase of complex components defined for each symbol. Next part represents signals size adjustment realized by standard signal preprocessing methods: median filtration, detrendization and resampling. The final part necessary for genomic signals comparison is position and length alignment of genomic signals by dynamic time warping (DTW).

Results: The application of the DTW on set of genomic signals was evaluated in dendrogram construction using cluster analysis. The resulting tree was compared with a classical phylogenetic tree reconstructed using multiple alignment. The classification of genomic signals using the DTW is evolutionary closer to phylogeny of organisms. This method is more resistant to errors in the sequences and less dependent on the number of input sequences.

Conclusions: Classification of genomic signals using dynamic time warping is an adequate variant to phylogenetic analysis using the symbolic DNA sequences alignment; in addition, it is robust, quick and more precise technique.

Show MeSH
Related in: MedlinePlus