Limits...
Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

Borrayo E, Mendizabal-Ruiz EG, Vélez-Pérez H, Romo-Vázquez R, Mendizabal AP, Morales JA - PLoS ONE (2014)

Bottom Line: We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal.Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments.Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

View Article: PubMed Central - PubMed

Affiliation: Computer Sciences Department, CUCEI - Universidad de Guadalajara, Guadalajara, México.

ABSTRACT
Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

No MeSH data available.


Phylogenetic trees generated by NW and GAFD for two selected orthologies: K14221 (tRNA-Asp) and K14224 (tRNA-GLU).The trees have been simplified to depict similarities in clustering. Each color represents a particular organism cluster: Orange: S. cerevisiae, red: A. gossypii, green: L. thermotolerans, blue: D. hansenii.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4230918&req=5

pone-0110954-g004: Phylogenetic trees generated by NW and GAFD for two selected orthologies: K14221 (tRNA-Asp) and K14224 (tRNA-GLU).The trees have been simplified to depict similarities in clustering. Each color represents a particular organism cluster: Orange: S. cerevisiae, red: A. gossypii, green: L. thermotolerans, blue: D. hansenii.

Mentions: In this experiment, we selected evolutionary markers corresponding to coding (i.e., 21 tRNA synthetases and 2 ribosomal proteins) and non-coding (i.e., 20 tRNAs and 2 rRNAs) genes. We included species present in all KEGG orthologies and then selected all entries belonging to these organisms. We constructed and compared the phylogenetic trees generated using GAFD, NW, and Phylip. Figure 4 depicts two examples of trees generated by NW and GAFD for two selected orthologies (tRNA-Asp and tRNA-GLU). Note the similarity in gene clustering by GAFD and NW. Tables 2, 3, and 4 list the similarity scores for the non-coding tRNAs, coding tRNA synthetases, and coding/non-coding ribosomal genes, respectively. The mean scores for the non-coding genes were , while was exhibited for the coding genes. In general, the cluster overlapping scores between the methods were relatively high, indicating that GAFD can group similar sequences effectively.


Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

Borrayo E, Mendizabal-Ruiz EG, Vélez-Pérez H, Romo-Vázquez R, Mendizabal AP, Morales JA - PLoS ONE (2014)

Phylogenetic trees generated by NW and GAFD for two selected orthologies: K14221 (tRNA-Asp) and K14224 (tRNA-GLU).The trees have been simplified to depict similarities in clustering. Each color represents a particular organism cluster: Orange: S. cerevisiae, red: A. gossypii, green: L. thermotolerans, blue: D. hansenii.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4230918&req=5

pone-0110954-g004: Phylogenetic trees generated by NW and GAFD for two selected orthologies: K14221 (tRNA-Asp) and K14224 (tRNA-GLU).The trees have been simplified to depict similarities in clustering. Each color represents a particular organism cluster: Orange: S. cerevisiae, red: A. gossypii, green: L. thermotolerans, blue: D. hansenii.
Mentions: In this experiment, we selected evolutionary markers corresponding to coding (i.e., 21 tRNA synthetases and 2 ribosomal proteins) and non-coding (i.e., 20 tRNAs and 2 rRNAs) genes. We included species present in all KEGG orthologies and then selected all entries belonging to these organisms. We constructed and compared the phylogenetic trees generated using GAFD, NW, and Phylip. Figure 4 depicts two examples of trees generated by NW and GAFD for two selected orthologies (tRNA-Asp and tRNA-GLU). Note the similarity in gene clustering by GAFD and NW. Tables 2, 3, and 4 list the similarity scores for the non-coding tRNAs, coding tRNA synthetases, and coding/non-coding ribosomal genes, respectively. The mean scores for the non-coding genes were , while was exhibited for the coding genes. In general, the cluster overlapping scores between the methods were relatively high, indicating that GAFD can group similar sequences effectively.

Bottom Line: We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal.Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments.Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

View Article: PubMed Central - PubMed

Affiliation: Computer Sciences Department, CUCEI - Universidad de Guadalajara, Guadalajara, México.

ABSTRACT
Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

No MeSH data available.