Limits...
A scalable method for analysis and display of DNA sequences.

Sirovich L, Stoeckle MY, Zhang Y - PLoS ONE (2009)

Bottom Line: A false-color matrix of vector correlations displayed affinities among species consistent with higher-order taxonomy.The indicator vectors preserved DNA character information and provided quantitative measures of correlations among taxonomic groups.This method is scalable to the largest datasets envisioned in this field, provides a visually-intuitive display that captures relational affinities derived from sequence data across a diversity of life forms, and is potentially a useful complement to current tree-building techniques for studying evolutionary processes based on DNA sequence data.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Applied Mathematics, Mount Sinai School of Medicine, New York, New York, United States of America. lawrence.sirovich@mssm.edu

ABSTRACT

Background: Comparative DNA sequence analysis provides insight into evolution and helps construct a natural classification reflecting the Tree of Life. The growing numbers of organisms represented in DNA databases challenge tree-building techniques and the vertical hierarchical classification may obscure relationships among some groups. Approaches that can incorporate sequence data from large numbers of taxa and enable visualization of affinities across groups are desirable.

Methodology/principal findings: Toward this end, we developed a procedure for extracting diagnostic patterns in the form of indicator vectors from DNA sequences of taxonomic groups. In the present instance the indicator vectors were derived from mitochondrial cytochrome c oxidase I (COI) sequences of those groups and further analyzed on this basis. In the first example, indicator vectors for birds, fish, and butterflies were constructed from a training set of COI sequences, then correlations with test sequences not used to construct the indicator vector were determined. In all cases, correlation with the indicator vector correctly assigned test sequences to their proper group. In the second example, this approach was explored at the species level within the bird grouping; this also gave correct assignment, suggesting the possibility of automated procedures for classification at various taxonomic levels. A false-color matrix of vector correlations displayed affinities among species consistent with higher-order taxonomy.

Conclusions/significance: The indicator vectors preserved DNA character information and provided quantitative measures of correlations among taxonomic groups. This method is scalable to the largest datasets envisioned in this field, provides a visually-intuitive display that captures relational affinities derived from sequence data across a diversity of life forms, and is potentially a useful complement to current tree-building techniques for studying evolutionary processes based on DNA sequence data.

Show MeSH
Correlation among group-level indicator vectors.A false-color map depicting correlations among indicator vectors , , and  for COI sequences of birds, fish, and butterflies, respectively, is shown. The numerical correlation values are indicated.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2749217&req=5

pone-0007051-g001: Correlation among group-level indicator vectors.A false-color map depicting correlations among indicator vectors , , and for COI sequences of birds, fish, and butterflies, respectively, is shown. The numerical correlation values are indicated.

Mentions: The first example considers COI sequences with randomly drawn sequences from three BOLD projects representing different groups of animals: birds, fish, and butterflies. Indicator functions , , and were constructed for these sequence sets as described in Material and Methods. Indicator vectors are a consequence of an optimization procedure which seeks a unit vector which is maximally correlated with a designated group, and simultaneously minimally correlated with the remaining groups under consideration. In general the results are collected together in the structure matrix(1)the elements of which furnish the correlation coefficients between groups. A false color representation of the structure matrix provides a visual display of correlations among groups (Figure 1). These calculations indicated that fish and bird vectors were well correlated, as might be expected for two classes of vertebrates, and both were poorly correlated with the butterfly vector, consistent with more distant evolutionary relationships.


A scalable method for analysis and display of DNA sequences.

Sirovich L, Stoeckle MY, Zhang Y - PLoS ONE (2009)

Correlation among group-level indicator vectors.A false-color map depicting correlations among indicator vectors , , and  for COI sequences of birds, fish, and butterflies, respectively, is shown. The numerical correlation values are indicated.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2749217&req=5

pone-0007051-g001: Correlation among group-level indicator vectors.A false-color map depicting correlations among indicator vectors , , and for COI sequences of birds, fish, and butterflies, respectively, is shown. The numerical correlation values are indicated.
Mentions: The first example considers COI sequences with randomly drawn sequences from three BOLD projects representing different groups of animals: birds, fish, and butterflies. Indicator functions , , and were constructed for these sequence sets as described in Material and Methods. Indicator vectors are a consequence of an optimization procedure which seeks a unit vector which is maximally correlated with a designated group, and simultaneously minimally correlated with the remaining groups under consideration. In general the results are collected together in the structure matrix(1)the elements of which furnish the correlation coefficients between groups. A false color representation of the structure matrix provides a visual display of correlations among groups (Figure 1). These calculations indicated that fish and bird vectors were well correlated, as might be expected for two classes of vertebrates, and both were poorly correlated with the butterfly vector, consistent with more distant evolutionary relationships.

Bottom Line: A false-color matrix of vector correlations displayed affinities among species consistent with higher-order taxonomy.The indicator vectors preserved DNA character information and provided quantitative measures of correlations among taxonomic groups.This method is scalable to the largest datasets envisioned in this field, provides a visually-intuitive display that captures relational affinities derived from sequence data across a diversity of life forms, and is potentially a useful complement to current tree-building techniques for studying evolutionary processes based on DNA sequence data.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Applied Mathematics, Mount Sinai School of Medicine, New York, New York, United States of America. lawrence.sirovich@mssm.edu

ABSTRACT

Background: Comparative DNA sequence analysis provides insight into evolution and helps construct a natural classification reflecting the Tree of Life. The growing numbers of organisms represented in DNA databases challenge tree-building techniques and the vertical hierarchical classification may obscure relationships among some groups. Approaches that can incorporate sequence data from large numbers of taxa and enable visualization of affinities across groups are desirable.

Methodology/principal findings: Toward this end, we developed a procedure for extracting diagnostic patterns in the form of indicator vectors from DNA sequences of taxonomic groups. In the present instance the indicator vectors were derived from mitochondrial cytochrome c oxidase I (COI) sequences of those groups and further analyzed on this basis. In the first example, indicator vectors for birds, fish, and butterflies were constructed from a training set of COI sequences, then correlations with test sequences not used to construct the indicator vector were determined. In all cases, correlation with the indicator vector correctly assigned test sequences to their proper group. In the second example, this approach was explored at the species level within the bird grouping; this also gave correct assignment, suggesting the possibility of automated procedures for classification at various taxonomic levels. A false-color matrix of vector correlations displayed affinities among species consistent with higher-order taxonomy.

Conclusions/significance: The indicator vectors preserved DNA character information and provided quantitative measures of correlations among taxonomic groups. This method is scalable to the largest datasets envisioned in this field, provides a visually-intuitive display that captures relational affinities derived from sequence data across a diversity of life forms, and is potentially a useful complement to current tree-building techniques for studying evolutionary processes based on DNA sequence data.

Show MeSH