Limits...
A scalable method for analysis and display of DNA sequences.

Sirovich L, Stoeckle MY, Zhang Y - PLoS ONE (2009)

Bottom Line: A false-color matrix of vector correlations displayed affinities among species consistent with higher-order taxonomy.The indicator vectors preserved DNA character information and provided quantitative measures of correlations among taxonomic groups.This method is scalable to the largest datasets envisioned in this field, provides a visually-intuitive display that captures relational affinities derived from sequence data across a diversity of life forms, and is potentially a useful complement to current tree-building techniques for studying evolutionary processes based on DNA sequence data.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Applied Mathematics, Mount Sinai School of Medicine, New York, New York, United States of America. lawrence.sirovich@mssm.edu

ABSTRACT

Background: Comparative DNA sequence analysis provides insight into evolution and helps construct a natural classification reflecting the Tree of Life. The growing numbers of organisms represented in DNA databases challenge tree-building techniques and the vertical hierarchical classification may obscure relationships among some groups. Approaches that can incorporate sequence data from large numbers of taxa and enable visualization of affinities across groups are desirable.

Methodology/principal findings: Toward this end, we developed a procedure for extracting diagnostic patterns in the form of indicator vectors from DNA sequences of taxonomic groups. In the present instance the indicator vectors were derived from mitochondrial cytochrome c oxidase I (COI) sequences of those groups and further analyzed on this basis. In the first example, indicator vectors for birds, fish, and butterflies were constructed from a training set of COI sequences, then correlations with test sequences not used to construct the indicator vector were determined. In all cases, correlation with the indicator vector correctly assigned test sequences to their proper group. In the second example, this approach was explored at the species level within the bird grouping; this also gave correct assignment, suggesting the possibility of automated procedures for classification at various taxonomic levels. A false-color matrix of vector correlations displayed affinities among species consistent with higher-order taxonomy.

Conclusions/significance: The indicator vectors preserved DNA character information and provided quantitative measures of correlations among taxonomic groups. This method is scalable to the largest datasets envisioned in this field, provides a visually-intuitive display that captures relational affinities derived from sequence data across a diversity of life forms, and is potentially a useful complement to current tree-building techniques for studying evolutionary processes based on DNA sequence data.

Show MeSH
Correlation of test sequences with group-level indicator vectors.False-color map of 4,332 COI test sequences compared to the indicator vectors depicted in Figure 1. In all cases, the test sequences showed highest affinity with their respective group vector.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2749217&req=5

pone-0007051-g002: Correlation of test sequences with group-level indicator vectors.False-color map of 4,332 COI test sequences compared to the indicator vectors depicted in Figure 1. In all cases, the test sequences showed highest affinity with their respective group vector.

Mentions: This indicator vector analysis was based on randomly choosing representatives for each of the base group matrices. This left a set of 4332 “test” sequences, i.e. those not used to construct indicator vectors (roughly 1600 bird, 1200 fish, and 1500 butterfly sequences). We then examined how well these test sequences were correlated with the indicator vectors. More specifically, each test sequence was translated into a vector as above, and correlations to the indicator vectors were determined. In all cases sequences from the test set were most highly correlated with the respective indicator vector for their group (Figure 2).


A scalable method for analysis and display of DNA sequences.

Sirovich L, Stoeckle MY, Zhang Y - PLoS ONE (2009)

Correlation of test sequences with group-level indicator vectors.False-color map of 4,332 COI test sequences compared to the indicator vectors depicted in Figure 1. In all cases, the test sequences showed highest affinity with their respective group vector.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2749217&req=5

pone-0007051-g002: Correlation of test sequences with group-level indicator vectors.False-color map of 4,332 COI test sequences compared to the indicator vectors depicted in Figure 1. In all cases, the test sequences showed highest affinity with their respective group vector.
Mentions: This indicator vector analysis was based on randomly choosing representatives for each of the base group matrices. This left a set of 4332 “test” sequences, i.e. those not used to construct indicator vectors (roughly 1600 bird, 1200 fish, and 1500 butterfly sequences). We then examined how well these test sequences were correlated with the indicator vectors. More specifically, each test sequence was translated into a vector as above, and correlations to the indicator vectors were determined. In all cases sequences from the test set were most highly correlated with the respective indicator vector for their group (Figure 2).

Bottom Line: A false-color matrix of vector correlations displayed affinities among species consistent with higher-order taxonomy.The indicator vectors preserved DNA character information and provided quantitative measures of correlations among taxonomic groups.This method is scalable to the largest datasets envisioned in this field, provides a visually-intuitive display that captures relational affinities derived from sequence data across a diversity of life forms, and is potentially a useful complement to current tree-building techniques for studying evolutionary processes based on DNA sequence data.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Applied Mathematics, Mount Sinai School of Medicine, New York, New York, United States of America. lawrence.sirovich@mssm.edu

ABSTRACT

Background: Comparative DNA sequence analysis provides insight into evolution and helps construct a natural classification reflecting the Tree of Life. The growing numbers of organisms represented in DNA databases challenge tree-building techniques and the vertical hierarchical classification may obscure relationships among some groups. Approaches that can incorporate sequence data from large numbers of taxa and enable visualization of affinities across groups are desirable.

Methodology/principal findings: Toward this end, we developed a procedure for extracting diagnostic patterns in the form of indicator vectors from DNA sequences of taxonomic groups. In the present instance the indicator vectors were derived from mitochondrial cytochrome c oxidase I (COI) sequences of those groups and further analyzed on this basis. In the first example, indicator vectors for birds, fish, and butterflies were constructed from a training set of COI sequences, then correlations with test sequences not used to construct the indicator vector were determined. In all cases, correlation with the indicator vector correctly assigned test sequences to their proper group. In the second example, this approach was explored at the species level within the bird grouping; this also gave correct assignment, suggesting the possibility of automated procedures for classification at various taxonomic levels. A false-color matrix of vector correlations displayed affinities among species consistent with higher-order taxonomy.

Conclusions/significance: The indicator vectors preserved DNA character information and provided quantitative measures of correlations among taxonomic groups. This method is scalable to the largest datasets envisioned in this field, provides a visually-intuitive display that captures relational affinities derived from sequence data across a diversity of life forms, and is potentially a useful complement to current tree-building techniques for studying evolutionary processes based on DNA sequence data.

Show MeSH