Limits...
Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

Borrayo E, Mendizabal-Ruiz EG, Vélez-Pérez H, Romo-Vázquez R, Mendizabal AP, Morales JA - PLoS ONE (2014)

Bottom Line: We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal.Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments.Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

View Article: PubMed Central - PubMed

Affiliation: Computer Sciences Department, CUCEI - Universidad de Guadalajara, Guadalajara, México.

ABSTRACT
Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

No MeSH data available.


Times required to determine the distance matrix using NW and GAFD.A 10,000 nt random sequence was created. Using this sequence as template, another was created that included 10 random substitutions. The previously created sequence then became a template for the creation of a new sequence with 10 new random substitutions in non-mutated bases. The process was repeated until 20% of the sequence had changed. Then, both NW and GAFD were used to build distance matrices with an increasing number of sequences and the computer time was registered. Results are plotted on a logarithmic scale.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4230918&req=5

pone-0110954-g005: Times required to determine the distance matrix using NW and GAFD.A 10,000 nt random sequence was created. Using this sequence as template, another was created that included 10 random substitutions. The previously created sequence then became a template for the creation of a new sequence with 10 new random substitutions in non-mutated bases. The process was repeated until 20% of the sequence had changed. Then, both NW and GAFD were used to build distance matrices with an increasing number of sequences and the computer time was registered. Results are plotted on a logarithmic scale.

Mentions: Figure 5 depicts the times required to compute the distance matrices using NW and GAFD on a desktop PC (i-Core 7, 2GHz, 6 GB RAM) for different numbers of sequences. GAFD performed faster than NW despite the implementation of high level MATLAB code. We believe that this performance could be improved by employing low level coding (e.g., C++) and tools such as GPU and parallel computing. A comparison of computer times for Phylip was not necessary because this method does not compute a similarity matrix.


Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

Borrayo E, Mendizabal-Ruiz EG, Vélez-Pérez H, Romo-Vázquez R, Mendizabal AP, Morales JA - PLoS ONE (2014)

Times required to determine the distance matrix using NW and GAFD.A 10,000 nt random sequence was created. Using this sequence as template, another was created that included 10 random substitutions. The previously created sequence then became a template for the creation of a new sequence with 10 new random substitutions in non-mutated bases. The process was repeated until 20% of the sequence had changed. Then, both NW and GAFD were used to build distance matrices with an increasing number of sequences and the computer time was registered. Results are plotted on a logarithmic scale.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4230918&req=5

pone-0110954-g005: Times required to determine the distance matrix using NW and GAFD.A 10,000 nt random sequence was created. Using this sequence as template, another was created that included 10 random substitutions. The previously created sequence then became a template for the creation of a new sequence with 10 new random substitutions in non-mutated bases. The process was repeated until 20% of the sequence had changed. Then, both NW and GAFD were used to build distance matrices with an increasing number of sequences and the computer time was registered. Results are plotted on a logarithmic scale.
Mentions: Figure 5 depicts the times required to compute the distance matrices using NW and GAFD on a desktop PC (i-Core 7, 2GHz, 6 GB RAM) for different numbers of sequences. GAFD performed faster than NW despite the implementation of high level MATLAB code. We believe that this performance could be improved by employing low level coding (e.g., C++) and tools such as GPU and parallel computing. A comparison of computer times for Phylip was not necessary because this method does not compute a similarity matrix.

Bottom Line: We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal.Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments.Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

View Article: PubMed Central - PubMed

Affiliation: Computer Sciences Department, CUCEI - Universidad de Guadalajara, Guadalajara, México.

ABSTRACT
Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

No MeSH data available.