Limits...
Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

Borrayo E, Mendizabal-Ruiz EG, Vélez-Pérez H, Romo-Vázquez R, Mendizabal AP, Morales JA - PLoS ONE (2014)

Bottom Line: We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal.Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments.Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

View Article: PubMed Central - PubMed

Affiliation: Computer Sciences Department, CUCEI - Universidad de Guadalajara, Guadalajara, México.

ABSTRACT
Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

No MeSH data available.


Schematic of the regions where selected entries of the signal dictionary were found in the different species.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4230918&req=5

pone-0110954-g007: Schematic of the regions where selected entries of the signal dictionary were found in the different species.

Mentions: As a preliminary domain search assay, we conducted another experiment using real data (i.e., ribosomal S18 subunit sequences from the previous 26 selected species). The signal corresponding to the Homo sapiens sequence was segmented into non-overlapping fragments of length to generate a “signal dictionary”. From the dictionary, seven entries were selected at random and compared against the complete signal set employing a sliding window of length . For each position within the sliding window, we computed the proposed similarity descriptors. We considered the segment of signal contained within the sliding window as similar to that from the dictionary if the correlation and coherence descriptors were larger than 0.9 and the comparison derivative was less than 0.8. The resulting alignment schematic is depicted in Figure 7. Even when the fragments were selected randomly, our results provide evidence that most mammals share similar fragments. Note that the number of shared fragments decreases as the sequences become less related to the original sequence (i.e., H. sapiens). Interestingly, insects shared the least number of fragments. These data suggest that it may be possible to determine biologically significant elements among compared sequences.


Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

Borrayo E, Mendizabal-Ruiz EG, Vélez-Pérez H, Romo-Vázquez R, Mendizabal AP, Morales JA - PLoS ONE (2014)

Schematic of the regions where selected entries of the signal dictionary were found in the different species.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4230918&req=5

pone-0110954-g007: Schematic of the regions where selected entries of the signal dictionary were found in the different species.
Mentions: As a preliminary domain search assay, we conducted another experiment using real data (i.e., ribosomal S18 subunit sequences from the previous 26 selected species). The signal corresponding to the Homo sapiens sequence was segmented into non-overlapping fragments of length to generate a “signal dictionary”. From the dictionary, seven entries were selected at random and compared against the complete signal set employing a sliding window of length . For each position within the sliding window, we computed the proposed similarity descriptors. We considered the segment of signal contained within the sliding window as similar to that from the dictionary if the correlation and coherence descriptors were larger than 0.9 and the comparison derivative was less than 0.8. The resulting alignment schematic is depicted in Figure 7. Even when the fragments were selected randomly, our results provide evidence that most mammals share similar fragments. Note that the number of shared fragments decreases as the sequences become less related to the original sequence (i.e., H. sapiens). Interestingly, insects shared the least number of fragments. These data suggest that it may be possible to determine biologically significant elements among compared sequences.

Bottom Line: We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal.Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments.Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

View Article: PubMed Central - PubMed

Affiliation: Computer Sciences Department, CUCEI - Universidad de Guadalajara, Guadalajara, México.

ABSTRACT
Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

No MeSH data available.