Limits...
Bootstrap-based support of HGT inferred by maximum parsimony.

Park HJ, Jin G, Nakhleh L - BMC Evol. Biol. (2010)

Bottom Line: An ad hoc solution to this problem that has been used entails inspecting the improvement in the parsimony length as more reticulation events are added to the model, and stopping when the improvement is below a certain threshold.A number of samples is generated from the given sequence alignment, and reticulation events are inferred based on each sample.Finally, the support of each reticulation event is quantified based on the inferences made over all samples.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Rice University, 6100 Main Street, MS 132, Houston, Texas 77005, USA.

ABSTRACT

Background: Maximum parsimony is one of the most commonly used criteria for reconstructing phylogenetic trees. Recently, Nakhleh and co-workers extended this criterion to enable reconstruction of phylogenetic networks, and demonstrated its application to detecting reticulate evolutionary relationships. However, one of the major problems with this extension has been that it favors more complex evolutionary relationships over simpler ones, thus having the potential for overestimating the amount of reticulation in the data. An ad hoc solution to this problem that has been used entails inspecting the improvement in the parsimony length as more reticulation events are added to the model, and stopping when the improvement is below a certain threshold.

Results: In this paper, we address this problem in a more systematic way, by proposing a nonparametric bootstrap-based measure of support of inferred reticulation events, and using it to determine the number of those events, as well as their placements. A number of samples is generated from the given sequence alignment, and reticulation events are inferred based on each sample. Finally, the support of each reticulation event is quantified based on the inferences made over all samples.

Conclusions: We have implemented our method in the NEPAL software tool (available publicly at http://bioinfo.cs.rice.edu/), and studied its performance on both biological and simulated data sets. While our studies show very promising results, they also highlight issues that are inherently challenging when applying the maximum parsimony criterion to detect reticulate evolution.

Show MeSH
(a) A scenario where none of four HGT edges identified individually in 100 bootstrap samples has good support (the recipient of each of the four edges is the same node v in the species tree). (b) When combined, thus allowing for ambiguity in pinpointing the exact source, a well-supported hypothesis of an HGT emerges.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2874802&req=5

Figure 2: (a) A scenario where none of four HGT edges identified individually in 100 bootstrap samples has good support (the recipient of each of the four edges is the same node v in the species tree). (b) When combined, thus allowing for ambiguity in pinpointing the exact source, a well-supported hypothesis of an HGT emerges.

Mentions: Pinpointing the exact location of an HGT edge is a very hard task in practice, which would be expected to affect the support of inferred HGT. Indeed, our experimental results show that the support of an HGT edge, as given by Formula (2), tends to be very conservative, due to the strict requirement that hi and h must be identical (see Results and Discussion section). From our empirical analysis of the performance of MP, we found that the major cause behind a poor support of a correctly inferred HGT edge is that "neighbors" of the source may be as good candidates as the source itself under the MP criterion. We illustrate this in Figure 2. In the cartoon shown in Figure 2(a), four HGT edges, involving edge e as the recipient, were identified individually in 100 bootstrap samples, each with the associated support (out of 100). While none of them has good support, combined they produce a well-supported hypothesis of an HGT involving the clade, as shown in Figure 2(b). Empirically, we found that this process of introducing ambiguity in the source of an HGT edge often involves immediate neighbor edges of the source. In other words, we can refine Formula (2) of estimating the support of an edge h : D(X) → Y, where D(X) is a set of (neighboring) edges that correspond to potential sources, to obtain(3)


Bootstrap-based support of HGT inferred by maximum parsimony.

Park HJ, Jin G, Nakhleh L - BMC Evol. Biol. (2010)

(a) A scenario where none of four HGT edges identified individually in 100 bootstrap samples has good support (the recipient of each of the four edges is the same node v in the species tree). (b) When combined, thus allowing for ambiguity in pinpointing the exact source, a well-supported hypothesis of an HGT emerges.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2874802&req=5

Figure 2: (a) A scenario where none of four HGT edges identified individually in 100 bootstrap samples has good support (the recipient of each of the four edges is the same node v in the species tree). (b) When combined, thus allowing for ambiguity in pinpointing the exact source, a well-supported hypothesis of an HGT emerges.
Mentions: Pinpointing the exact location of an HGT edge is a very hard task in practice, which would be expected to affect the support of inferred HGT. Indeed, our experimental results show that the support of an HGT edge, as given by Formula (2), tends to be very conservative, due to the strict requirement that hi and h must be identical (see Results and Discussion section). From our empirical analysis of the performance of MP, we found that the major cause behind a poor support of a correctly inferred HGT edge is that "neighbors" of the source may be as good candidates as the source itself under the MP criterion. We illustrate this in Figure 2. In the cartoon shown in Figure 2(a), four HGT edges, involving edge e as the recipient, were identified individually in 100 bootstrap samples, each with the associated support (out of 100). While none of them has good support, combined they produce a well-supported hypothesis of an HGT involving the clade, as shown in Figure 2(b). Empirically, we found that this process of introducing ambiguity in the source of an HGT edge often involves immediate neighbor edges of the source. In other words, we can refine Formula (2) of estimating the support of an edge h : D(X) → Y, where D(X) is a set of (neighboring) edges that correspond to potential sources, to obtain(3)

Bottom Line: An ad hoc solution to this problem that has been used entails inspecting the improvement in the parsimony length as more reticulation events are added to the model, and stopping when the improvement is below a certain threshold.A number of samples is generated from the given sequence alignment, and reticulation events are inferred based on each sample.Finally, the support of each reticulation event is quantified based on the inferences made over all samples.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Rice University, 6100 Main Street, MS 132, Houston, Texas 77005, USA.

ABSTRACT

Background: Maximum parsimony is one of the most commonly used criteria for reconstructing phylogenetic trees. Recently, Nakhleh and co-workers extended this criterion to enable reconstruction of phylogenetic networks, and demonstrated its application to detecting reticulate evolutionary relationships. However, one of the major problems with this extension has been that it favors more complex evolutionary relationships over simpler ones, thus having the potential for overestimating the amount of reticulation in the data. An ad hoc solution to this problem that has been used entails inspecting the improvement in the parsimony length as more reticulation events are added to the model, and stopping when the improvement is below a certain threshold.

Results: In this paper, we address this problem in a more systematic way, by proposing a nonparametric bootstrap-based measure of support of inferred reticulation events, and using it to determine the number of those events, as well as their placements. A number of samples is generated from the given sequence alignment, and reticulation events are inferred based on each sample. Finally, the support of each reticulation event is quantified based on the inferences made over all samples.

Conclusions: We have implemented our method in the NEPAL software tool (available publicly at http://bioinfo.cs.rice.edu/), and studied its performance on both biological and simulated data sets. While our studies show very promising results, they also highlight issues that are inherently challenging when applying the maximum parsimony criterion to detect reticulate evolution.

Show MeSH