Limits...
iPhy: an integrated phylogenetic workbench for supermatrix analyses.

Jones MO, Koutsovoulos GD, Blaxter ML - BMC Bioinformatics (2011)

Bottom Line: In order to take advantage of increasing datasets, user-friendly tools are required to facilitate phylogenetic analyses and to reduce duplication of dataset assembly efforts.Current phylogenetic pipelines are dependency-heavy and have significant technical barriers to use.A representative instance of iPhy can be accessed at iphy.bio.ed.ac.uk, but the toolkit can also be deployed on a local server for advanced users. iPhy provides an easy-to-use environment for the assembly, analysis and sharing of large phylogenetic datasets, while encouraging best practices in terms of phylogenetic analysis and taxon selection.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH93JT, UK. martin.jones@ed.ac.uk

ABSTRACT

Background: The increasing availability of molecular sequence data means that the accuracy of future phylogenetic studies is likely to by limited by systematic bias and taxon choice rather than by data. In order to take advantage of increasing datasets, user-friendly tools are required to facilitate phylogenetic analyses and to reduce duplication of dataset assembly efforts. Current phylogenetic pipelines are dependency-heavy and have significant technical barriers to use.

Results: Here we present iPhy, a web application that lets non-technical users assemble, share and analyse DNA sequence datasets for multigene phylogenetic investigations. Built on a simple client-server architecture, iPhy eases the collection of gene sets for analysis, facilitates alignment and reliably generates phylogenetic analysis-ready data files. Phylogenetic trees generated in external programs can be imported and stored, and iPhy integrates with iTol to allow trees to be displayed with rich data annotation. The datasets collated in iPhy can be shared through the client interface. We show how systematic biases can be addressed by using explicit criteria when selecting sequences for analysis from a large dataset. A representative instance of iPhy can be accessed at iphy.bio.ed.ac.uk, but the toolkit can also be deployed on a local server for advanced users.

Conclusions: iPhy provides an easy-to-use environment for the assembly, analysis and sharing of large phylogenetic datasets, while encouraging best practices in terms of phylogenetic analysis and taxon selection.

Show MeSH

Related in: MedlinePlus

Analyses of slices from the Nematoda dataset. The figure shows the results of analyses of automatically selected taxon subsets from the Nematoda dataset using various criteria for a three-gene supermatrix. (A) most_chars species (one per order) with most characters for the three genes; (B) least_bias species showing the lowest base composition bias; (C) slowest_rate species with the inferred slowest overall rate of evolution. (D) For comparison, we show the tree derived from alignment of full length SSU rRNA sequences for twelve of the fourteen species included in the iPhy slices in parts (A), (B) and (C). For two of the species (Caenorhabditis sp. 5 and Ditylenchus africanus) no SSU rRNA sequence was available so we have included closely-related species (Caenorhabditis briggsae and Ditylenchus angustus). Clade membership sensu Blaxter 1998 [36] is shown on the tree. For each iPhy subset the figure shows, from left to right, the tree resulting from phylogenetic analysis; a heat map showing the AT content of each of the three genes; a stacked bar chart showing the number of characters for each gene. Scale bars above each tree show the branch length associated with 0.1 changes per site. Order names are given in parentheses. The keys at the bottom of the figure show, from left to right, the mapping of colours to AT content for the heatmap, and the mapping of colours to loci for the bar chart. The scale bar shows the length of bar representing 1000 characters.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3037854&req=5

Figure 3: Analyses of slices from the Nematoda dataset. The figure shows the results of analyses of automatically selected taxon subsets from the Nematoda dataset using various criteria for a three-gene supermatrix. (A) most_chars species (one per order) with most characters for the three genes; (B) least_bias species showing the lowest base composition bias; (C) slowest_rate species with the inferred slowest overall rate of evolution. (D) For comparison, we show the tree derived from alignment of full length SSU rRNA sequences for twelve of the fourteen species included in the iPhy slices in parts (A), (B) and (C). For two of the species (Caenorhabditis sp. 5 and Ditylenchus africanus) no SSU rRNA sequence was available so we have included closely-related species (Caenorhabditis briggsae and Ditylenchus angustus). Clade membership sensu Blaxter 1998 [36] is shown on the tree. For each iPhy subset the figure shows, from left to right, the tree resulting from phylogenetic analysis; a heat map showing the AT content of each of the three genes; a stacked bar chart showing the number of characters for each gene. Scale bars above each tree show the branch length associated with 0.1 changes per site. Order names are given in parentheses. The keys at the bottom of the figure show, from left to right, the mapping of colours to AT content for the heatmap, and the mapping of colours to loci for the bar chart. The scale bar shows the length of bar representing 1000 characters.

Mentions: The subset where species were selected on the basis of the number of characters available (Figure 3A) contained the most characters, and was the only subset where every gene was present for every species. This subset had the highest standard deviation in AT content among species for all three loci, and produced the tree with the greatest overall length. The subsets where species were selected for low AT content bias (Figure 3B) and low evolutionary rate (Figure 3C) showed the expected AT content and tree length characteristics.


iPhy: an integrated phylogenetic workbench for supermatrix analyses.

Jones MO, Koutsovoulos GD, Blaxter ML - BMC Bioinformatics (2011)

Analyses of slices from the Nematoda dataset. The figure shows the results of analyses of automatically selected taxon subsets from the Nematoda dataset using various criteria for a three-gene supermatrix. (A) most_chars species (one per order) with most characters for the three genes; (B) least_bias species showing the lowest base composition bias; (C) slowest_rate species with the inferred slowest overall rate of evolution. (D) For comparison, we show the tree derived from alignment of full length SSU rRNA sequences for twelve of the fourteen species included in the iPhy slices in parts (A), (B) and (C). For two of the species (Caenorhabditis sp. 5 and Ditylenchus africanus) no SSU rRNA sequence was available so we have included closely-related species (Caenorhabditis briggsae and Ditylenchus angustus). Clade membership sensu Blaxter 1998 [36] is shown on the tree. For each iPhy subset the figure shows, from left to right, the tree resulting from phylogenetic analysis; a heat map showing the AT content of each of the three genes; a stacked bar chart showing the number of characters for each gene. Scale bars above each tree show the branch length associated with 0.1 changes per site. Order names are given in parentheses. The keys at the bottom of the figure show, from left to right, the mapping of colours to AT content for the heatmap, and the mapping of colours to loci for the bar chart. The scale bar shows the length of bar representing 1000 characters.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3037854&req=5

Figure 3: Analyses of slices from the Nematoda dataset. The figure shows the results of analyses of automatically selected taxon subsets from the Nematoda dataset using various criteria for a three-gene supermatrix. (A) most_chars species (one per order) with most characters for the three genes; (B) least_bias species showing the lowest base composition bias; (C) slowest_rate species with the inferred slowest overall rate of evolution. (D) For comparison, we show the tree derived from alignment of full length SSU rRNA sequences for twelve of the fourteen species included in the iPhy slices in parts (A), (B) and (C). For two of the species (Caenorhabditis sp. 5 and Ditylenchus africanus) no SSU rRNA sequence was available so we have included closely-related species (Caenorhabditis briggsae and Ditylenchus angustus). Clade membership sensu Blaxter 1998 [36] is shown on the tree. For each iPhy subset the figure shows, from left to right, the tree resulting from phylogenetic analysis; a heat map showing the AT content of each of the three genes; a stacked bar chart showing the number of characters for each gene. Scale bars above each tree show the branch length associated with 0.1 changes per site. Order names are given in parentheses. The keys at the bottom of the figure show, from left to right, the mapping of colours to AT content for the heatmap, and the mapping of colours to loci for the bar chart. The scale bar shows the length of bar representing 1000 characters.
Mentions: The subset where species were selected on the basis of the number of characters available (Figure 3A) contained the most characters, and was the only subset where every gene was present for every species. This subset had the highest standard deviation in AT content among species for all three loci, and produced the tree with the greatest overall length. The subsets where species were selected for low AT content bias (Figure 3B) and low evolutionary rate (Figure 3C) showed the expected AT content and tree length characteristics.

Bottom Line: In order to take advantage of increasing datasets, user-friendly tools are required to facilitate phylogenetic analyses and to reduce duplication of dataset assembly efforts.Current phylogenetic pipelines are dependency-heavy and have significant technical barriers to use.A representative instance of iPhy can be accessed at iphy.bio.ed.ac.uk, but the toolkit can also be deployed on a local server for advanced users. iPhy provides an easy-to-use environment for the assembly, analysis and sharing of large phylogenetic datasets, while encouraging best practices in terms of phylogenetic analysis and taxon selection.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH93JT, UK. martin.jones@ed.ac.uk

ABSTRACT

Background: The increasing availability of molecular sequence data means that the accuracy of future phylogenetic studies is likely to by limited by systematic bias and taxon choice rather than by data. In order to take advantage of increasing datasets, user-friendly tools are required to facilitate phylogenetic analyses and to reduce duplication of dataset assembly efforts. Current phylogenetic pipelines are dependency-heavy and have significant technical barriers to use.

Results: Here we present iPhy, a web application that lets non-technical users assemble, share and analyse DNA sequence datasets for multigene phylogenetic investigations. Built on a simple client-server architecture, iPhy eases the collection of gene sets for analysis, facilitates alignment and reliably generates phylogenetic analysis-ready data files. Phylogenetic trees generated in external programs can be imported and stored, and iPhy integrates with iTol to allow trees to be displayed with rich data annotation. The datasets collated in iPhy can be shared through the client interface. We show how systematic biases can be addressed by using explicit criteria when selecting sequences for analysis from a large dataset. A representative instance of iPhy can be accessed at iphy.bio.ed.ac.uk, but the toolkit can also be deployed on a local server for advanced users.

Conclusions: iPhy provides an easy-to-use environment for the assembly, analysis and sharing of large phylogenetic datasets, while encouraging best practices in terms of phylogenetic analysis and taxon selection.

Show MeSH
Related in: MedlinePlus