Limits...
Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

Borrayo E, Mendizabal-Ruiz EG, Vélez-Pérez H, Romo-Vázquez R, Mendizabal AP, Morales JA - PLoS ONE (2014)

Bottom Line: We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal.Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments.Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

View Article: PubMed Central - PubMed

Affiliation: Computer Sciences Department, CUCEI - Universidad de Guadalajara, Guadalajara, México.

ABSTRACT
Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

No MeSH data available.


Similarity score for a sequence compared with modified versions of itself using different values of α.A 10,000 nt random sequence was created. Using this sequence as a template, a second was created that included one random substitution. The remaining sequences were built based on the last created sequence, adding new random substitutions. The result was an original sequence and four mutated sequences bearing 1, 3, 5, and 10 cumulative substitutions. The original sequence was then compared with each mutated sequence using different α values.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4230918&req=5

pone-0110954-g002: Similarity score for a sequence compared with modified versions of itself using different values of α.A 10,000 nt random sequence was created. Using this sequence as a template, a second was created that included one random substitution. The remaining sequences were built based on the last created sequence, adding new random substitutions. The result was an original sequence and four mutated sequences bearing 1, 3, 5, and 10 cumulative substitutions. The original sequence was then compared with each mutated sequence using different α values.

Mentions: Our DNA sequence-to-signal mapping tool requires that different values be set for every possible doublet (i.e., 16 different values). For all the experiments presented in this section, we employed the values listed in Table 1. The proposed DNA sequence-to-signal mapping was designed to consider the nucleotides within a window defined by α. An example of the effect of α on the proposed mapping is depicted in Figure 1. As the value of α increases, the resulting DNA signal becomes smoother as the values corresponding to nucleotides within the window are combined. This indicates that the value of α determines how far a change is propagated through the signal. Note that a single nucleotide substitution produces a vertical shifting effect depending on the value of α with respect to a signal corresponding to a similar sequence. As α increases, a substitution has less impact on the signal shift. Indels in the DNA sequence are reflected as a horizontal shift with respect to another similar sequence proportional to the number of deleted or inserted bases. Figure 2 depicts the distance as computed by GAFD with respect to different numbers of changes in a given sequence employing different values of α. Note that, compared to methods that perform DNA sequence-to-signal mapping using individual nucleotides, α determines the robustness of our method with respect to subtle differences between the sequences being evaluated. In this work, we chose to employ since this value allows us to distinguish between different numbers of signal changes.


Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

Borrayo E, Mendizabal-Ruiz EG, Vélez-Pérez H, Romo-Vázquez R, Mendizabal AP, Morales JA - PLoS ONE (2014)

Similarity score for a sequence compared with modified versions of itself using different values of α.A 10,000 nt random sequence was created. Using this sequence as a template, a second was created that included one random substitution. The remaining sequences were built based on the last created sequence, adding new random substitutions. The result was an original sequence and four mutated sequences bearing 1, 3, 5, and 10 cumulative substitutions. The original sequence was then compared with each mutated sequence using different α values.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4230918&req=5

pone-0110954-g002: Similarity score for a sequence compared with modified versions of itself using different values of α.A 10,000 nt random sequence was created. Using this sequence as a template, a second was created that included one random substitution. The remaining sequences were built based on the last created sequence, adding new random substitutions. The result was an original sequence and four mutated sequences bearing 1, 3, 5, and 10 cumulative substitutions. The original sequence was then compared with each mutated sequence using different α values.
Mentions: Our DNA sequence-to-signal mapping tool requires that different values be set for every possible doublet (i.e., 16 different values). For all the experiments presented in this section, we employed the values listed in Table 1. The proposed DNA sequence-to-signal mapping was designed to consider the nucleotides within a window defined by α. An example of the effect of α on the proposed mapping is depicted in Figure 1. As the value of α increases, the resulting DNA signal becomes smoother as the values corresponding to nucleotides within the window are combined. This indicates that the value of α determines how far a change is propagated through the signal. Note that a single nucleotide substitution produces a vertical shifting effect depending on the value of α with respect to a signal corresponding to a similar sequence. As α increases, a substitution has less impact on the signal shift. Indels in the DNA sequence are reflected as a horizontal shift with respect to another similar sequence proportional to the number of deleted or inserted bases. Figure 2 depicts the distance as computed by GAFD with respect to different numbers of changes in a given sequence employing different values of α. Note that, compared to methods that perform DNA sequence-to-signal mapping using individual nucleotides, α determines the robustness of our method with respect to subtle differences between the sequences being evaluated. In this work, we chose to employ since this value allows us to distinguish between different numbers of signal changes.

Bottom Line: We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal.Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments.Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

View Article: PubMed Central - PubMed

Affiliation: Computer Sciences Department, CUCEI - Universidad de Guadalajara, Guadalajara, México.

ABSTRACT
Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

No MeSH data available.