Limits...
An improved probability mapping approach to assess genome mosaicism.

Zhaxybayeva O, Gogarten JP - BMC Genomics (2003)

Bottom Line: The mapping of bootstrap support values from these extended datasets gives results similar to the original maximum likelihood and posterior probability mapping.Better taxon sampling combined with subtree analyses prevents the inconsistencies associated with four-taxon analyses, but retains the power of visual representation.Nevertheless, a case-by-case inspection of individual multi-taxon phylogenies remains necessary to differentiate unrecognized paralogy and shared phylogenetic reconstruction artifacts from horizontal gene transfer events.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular and Cell Biology, University of Connecticut, 91 North Eagleville Road, Storrs, CT 06269-3125, USA. olga@carrot.mcb.uconn.edu

ABSTRACT

Background: Maximum likelihood and posterior probability mapping are useful visualization techniques that are used to ascertain the mosaic nature of prokaryotic genomes. However, posterior probabilities, especially when calculated for four-taxon cases, tend to overestimate the support for tree topologies. Furthermore, because of poor taxon sampling four-taxon analyses suffer from sensitivity to the long branch attraction artifact. Here we extend the probability mapping approach by improving taxon sampling of the analyzed datasets, and by using bootstrap support values, a more conservative tool to assess reliability.

Results: Quartets of orthologous proteins were complemented with homologs from selected reference genomes. The mapping of bootstrap support values from these extended datasets gives results similar to the original maximum likelihood and posterior probability mapping. The more conservative nature of the plotted support values allows to focus further analyses on those protein families that strongly disagree with the majority or plurality of genes present in the analyzed genomes.

Conclusion: Posterior probability is a non-conservative measure for support, and posterior probability mapping only provides a quick estimation of phylogenetic information content of four genomes. This approach can be utilized as a pre-screen to select genes that might have been horizontally transferred. Better taxon sampling combined with subtree analyses prevents the inconsistencies associated with four-taxon analyses, but retains the power of visual representation. Nevertheless, a case-by-case inspection of individual multi-taxon phylogenies remains necessary to differentiate unrecognized paralogy and shared phylogenetic reconstruction artifacts from horizontal gene transfer events.

Show MeSH
Posterior probability maps of genome quartets containing Synechocystis sp. Posterior probabilities were calculated according the maximum likelihood mapping approach described in [4,17]. Tree topologies assigned to the vertices are depicted in New Hampshire tree format near the corresponding vertex of the triangle and they should be considered as unrooted tree topologies. The three numbers associated with each tree topology indicate how many QuartOPs fall into each of the three zones: "total" (i.e. posterior probability for the tree topology is larger than posterior probabilities for the other two topologies), 90% and 99% posterior probability respectively. A) Genome quartet consisting of Synechocystis sp., Halobacterium sp., Aquifex aeolicus and Thermotoga maritima. The majority of the QuartOPs support the grouping of the Halobacterium sp. with Synechocystis sp. B) Genome quartet consisting of Synechocystis sp., Archaeoglobus fulgidus, Aquifex aeolicus and Thermotoga maritima. The archaeon – Synechocystis sp. grouping is supported by fewer QuartOPs than in panel A.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC222983&req=5

Figure 1: Posterior probability maps of genome quartets containing Synechocystis sp. Posterior probabilities were calculated according the maximum likelihood mapping approach described in [4,17]. Tree topologies assigned to the vertices are depicted in New Hampshire tree format near the corresponding vertex of the triangle and they should be considered as unrooted tree topologies. The three numbers associated with each tree topology indicate how many QuartOPs fall into each of the three zones: "total" (i.e. posterior probability for the tree topology is larger than posterior probabilities for the other two topologies), 90% and 99% posterior probability respectively. A) Genome quartet consisting of Synechocystis sp., Halobacterium sp., Aquifex aeolicus and Thermotoga maritima. The majority of the QuartOPs support the grouping of the Halobacterium sp. with Synechocystis sp. B) Genome quartet consisting of Synechocystis sp., Archaeoglobus fulgidus, Aquifex aeolicus and Thermotoga maritima. The archaeon – Synechocystis sp. grouping is supported by fewer QuartOPs than in panel A.

Mentions: In [4] we described the analyses of several interdomain genome quartets. Some of the analyses were performed using a posterior probability mapping approach referred to as Maximum Likelihood (ML) mapping, a name that was coined in the original description of this approach [17]. We will use this term throughout the manuscript. In ML mapping posterior probabilities are calculated from the maximum likelihood values (see [17] and [4] for the details). One noteworthy finding was that in the genome quartet including Synechocystis sp., Halobacterium sp., Aquifex aeolicus and Thermotoga maritima the grouping of Halobacterium sp. with Synechocystis sp. was recovered by many more QuartOPs than the grouping expected following 16S rRNA phylogeny (see Fig. 1A). Note that throughout the manuscript we refer to a particular tree by mentioning two species out of four (e.g., in this case grouping of Halobacterium sp. with Synechocystis sp.); however, the trees are unrooted and therefore grouping of the other two taxa is implied. To test if this association was specific for Synechocystis sp., we had repeated the analyses replacing Synechocystis sp. with Bacillus subtilis. The results were qualitatively the same (data not shown). To test for the possibility that LBA [6] might be the reason for the strong support of Halobacterium sp. grouping with Synechocystis sp., we had repeated the analyses replacing the Halobacterium sp. genome with that from Archaeoglobus fulgidus, another archaeon. The majority of QuartOPs supported the grouping of the thermophilic archaeon Archaeoglobus with the thermophilic bacteria Aquifex and Thermotoga (see Fig. 1B). In this study, we reanalyzed the above-mentioned genome quartets by adding homologous sequences from sixty reference genomes to each QuartOP creating what we call "extended datasets". The dataflow is depicted in Figure 2. For each extended dataset we obtained bootstrap support values for each of the three four-taxon "subtrees" and we plotted the bootstrap support values into barycentric coordinates. Throughout this manuscript we use a graph theory definition of a subtree, i.e. "A tree G' whose graph vertices and graph edges form subsets of the graph vertices and graph edges of a given tree G" [18]. In particular, sequences (OTUs) included in the subtree are not required to be neighbors in the original tree. Subtrees defined according to these rules are different from subclades (see figure 2 for an illustration). For example, if the topology ((A,D),(B,C)) is supported by a given bootstrap sample, this means that in the tree calculated from this sample the sequence from genome A groups closer to the one from D than to the one from B or C (figure 2).


An improved probability mapping approach to assess genome mosaicism.

Zhaxybayeva O, Gogarten JP - BMC Genomics (2003)

Posterior probability maps of genome quartets containing Synechocystis sp. Posterior probabilities were calculated according the maximum likelihood mapping approach described in [4,17]. Tree topologies assigned to the vertices are depicted in New Hampshire tree format near the corresponding vertex of the triangle and they should be considered as unrooted tree topologies. The three numbers associated with each tree topology indicate how many QuartOPs fall into each of the three zones: "total" (i.e. posterior probability for the tree topology is larger than posterior probabilities for the other two topologies), 90% and 99% posterior probability respectively. A) Genome quartet consisting of Synechocystis sp., Halobacterium sp., Aquifex aeolicus and Thermotoga maritima. The majority of the QuartOPs support the grouping of the Halobacterium sp. with Synechocystis sp. B) Genome quartet consisting of Synechocystis sp., Archaeoglobus fulgidus, Aquifex aeolicus and Thermotoga maritima. The archaeon – Synechocystis sp. grouping is supported by fewer QuartOPs than in panel A.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC222983&req=5

Figure 1: Posterior probability maps of genome quartets containing Synechocystis sp. Posterior probabilities were calculated according the maximum likelihood mapping approach described in [4,17]. Tree topologies assigned to the vertices are depicted in New Hampshire tree format near the corresponding vertex of the triangle and they should be considered as unrooted tree topologies. The three numbers associated with each tree topology indicate how many QuartOPs fall into each of the three zones: "total" (i.e. posterior probability for the tree topology is larger than posterior probabilities for the other two topologies), 90% and 99% posterior probability respectively. A) Genome quartet consisting of Synechocystis sp., Halobacterium sp., Aquifex aeolicus and Thermotoga maritima. The majority of the QuartOPs support the grouping of the Halobacterium sp. with Synechocystis sp. B) Genome quartet consisting of Synechocystis sp., Archaeoglobus fulgidus, Aquifex aeolicus and Thermotoga maritima. The archaeon – Synechocystis sp. grouping is supported by fewer QuartOPs than in panel A.
Mentions: In [4] we described the analyses of several interdomain genome quartets. Some of the analyses were performed using a posterior probability mapping approach referred to as Maximum Likelihood (ML) mapping, a name that was coined in the original description of this approach [17]. We will use this term throughout the manuscript. In ML mapping posterior probabilities are calculated from the maximum likelihood values (see [17] and [4] for the details). One noteworthy finding was that in the genome quartet including Synechocystis sp., Halobacterium sp., Aquifex aeolicus and Thermotoga maritima the grouping of Halobacterium sp. with Synechocystis sp. was recovered by many more QuartOPs than the grouping expected following 16S rRNA phylogeny (see Fig. 1A). Note that throughout the manuscript we refer to a particular tree by mentioning two species out of four (e.g., in this case grouping of Halobacterium sp. with Synechocystis sp.); however, the trees are unrooted and therefore grouping of the other two taxa is implied. To test if this association was specific for Synechocystis sp., we had repeated the analyses replacing Synechocystis sp. with Bacillus subtilis. The results were qualitatively the same (data not shown). To test for the possibility that LBA [6] might be the reason for the strong support of Halobacterium sp. grouping with Synechocystis sp., we had repeated the analyses replacing the Halobacterium sp. genome with that from Archaeoglobus fulgidus, another archaeon. The majority of QuartOPs supported the grouping of the thermophilic archaeon Archaeoglobus with the thermophilic bacteria Aquifex and Thermotoga (see Fig. 1B). In this study, we reanalyzed the above-mentioned genome quartets by adding homologous sequences from sixty reference genomes to each QuartOP creating what we call "extended datasets". The dataflow is depicted in Figure 2. For each extended dataset we obtained bootstrap support values for each of the three four-taxon "subtrees" and we plotted the bootstrap support values into barycentric coordinates. Throughout this manuscript we use a graph theory definition of a subtree, i.e. "A tree G' whose graph vertices and graph edges form subsets of the graph vertices and graph edges of a given tree G" [18]. In particular, sequences (OTUs) included in the subtree are not required to be neighbors in the original tree. Subtrees defined according to these rules are different from subclades (see figure 2 for an illustration). For example, if the topology ((A,D),(B,C)) is supported by a given bootstrap sample, this means that in the tree calculated from this sample the sequence from genome A groups closer to the one from D than to the one from B or C (figure 2).

Bottom Line: The mapping of bootstrap support values from these extended datasets gives results similar to the original maximum likelihood and posterior probability mapping.Better taxon sampling combined with subtree analyses prevents the inconsistencies associated with four-taxon analyses, but retains the power of visual representation.Nevertheless, a case-by-case inspection of individual multi-taxon phylogenies remains necessary to differentiate unrecognized paralogy and shared phylogenetic reconstruction artifacts from horizontal gene transfer events.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular and Cell Biology, University of Connecticut, 91 North Eagleville Road, Storrs, CT 06269-3125, USA. olga@carrot.mcb.uconn.edu

ABSTRACT

Background: Maximum likelihood and posterior probability mapping are useful visualization techniques that are used to ascertain the mosaic nature of prokaryotic genomes. However, posterior probabilities, especially when calculated for four-taxon cases, tend to overestimate the support for tree topologies. Furthermore, because of poor taxon sampling four-taxon analyses suffer from sensitivity to the long branch attraction artifact. Here we extend the probability mapping approach by improving taxon sampling of the analyzed datasets, and by using bootstrap support values, a more conservative tool to assess reliability.

Results: Quartets of orthologous proteins were complemented with homologs from selected reference genomes. The mapping of bootstrap support values from these extended datasets gives results similar to the original maximum likelihood and posterior probability mapping. The more conservative nature of the plotted support values allows to focus further analyses on those protein families that strongly disagree with the majority or plurality of genes present in the analyzed genomes.

Conclusion: Posterior probability is a non-conservative measure for support, and posterior probability mapping only provides a quick estimation of phylogenetic information content of four genomes. This approach can be utilized as a pre-screen to select genes that might have been horizontally transferred. Better taxon sampling combined with subtree analyses prevents the inconsistencies associated with four-taxon analyses, but retains the power of visual representation. Nevertheless, a case-by-case inspection of individual multi-taxon phylogenies remains necessary to differentiate unrecognized paralogy and shared phylogenetic reconstruction artifacts from horizontal gene transfer events.

Show MeSH