Limits...
Inference of reticulate evolutionary histories by maximum likelihood: the performance of information criteria.

Park HJ, Nakhleh L - BMC Bioinformatics (2012)

Bottom Line: We find both of them, particularly the diameter, have a significant effect.Our results demonstrate that BIC provides a good framework for inferring reticulate evolutionary histories.Nevertheless, the results call for caution when interpreting the accuracy of the inference particularly for data sets with particular evolutionary features.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA. hjpark@bcm.edu

ABSTRACT

Background: Maximum likelihood has been widely used for over three decades to infer phylogenetic trees from molecular data. When reticulate evolutionary events occur, several genomic regions may have conflicting evolutionary histories, and a phylogenetic network may provide a more adequate model for representing the evolutionary history of the genomes or species. A maximum likelihood (ML) model has been proposed for this case and accounts for both mutation within a genomic region and reticulation across the regions. However, the performance of this model in terms of inferring information about reticulate evolution and properties that affect this performance have not been studied.

Results: In this paper, we study the effect of the evolutionary diameter and height of a reticulation event on its identifiability under ML. We find both of them, particularly the diameter, have a significant effect. Further, we find that the number of genes (which can be generalized to the concept of "non-recombining genomic regions") that are transferred across a reticulation edge affects its detectability. Last but not least, a fundamental challenge with phylogenetic networks is that they allow an arbitrary level of complexity, giving rise to the model selection problem. We investigate the performance of two information criteria, the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC), for addressing this problem. We find that BIC performs well in general for controlling the model complexity and preventing ML from grossly overestimating the number of reticulation events.

Conclusion: Our results demonstrate that BIC provides a good framework for inferring reticulate evolutionary histories. Nevertheless, the results call for caution when interpreting the accuracy of the inference particularly for data sets with particular evolutionary features.

Show MeSH
(a) Three evolutionary histories, each involving the same underlying tree (black lines) and a single reticulation edge from the set of three reticulation edges 1, 2, and 3. The diameters of the three reticulation edges 1, 2, 3 are 0.5, 1.0, and 1.5, respectively. (b,c) The performance of ML for estimating the inheritance probabilities on data simulated with a single reticulation event. The genome size corresponds to the number of gene data sets used in the inference. Each panel contains three segments, corresponding to three different values of true inheritance probabilities: 0.1, 0.3, and 0.5. The inheritance probabilities γe were estimated using Eq. (4). The two diameters of d = 0.5 (b) and d = 1.5 (c) correspond to the two networks of (a), with HGT edges 1 and 3, respectively; results for the third network are omitted due to space limitations. See text for more details.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3526433&req=5

Figure 2: (a) Three evolutionary histories, each involving the same underlying tree (black lines) and a single reticulation edge from the set of three reticulation edges 1, 2, and 3. The diameters of the three reticulation edges 1, 2, 3 are 0.5, 1.0, and 1.5, respectively. (b,c) The performance of ML for estimating the inheritance probabilities on data simulated with a single reticulation event. The genome size corresponds to the number of gene data sets used in the inference. Each panel contains three segments, corresponding to three different values of true inheritance probabilities: 0.1, 0.3, and 0.5. The inheritance probabilities γe were estimated using Eq. (4). The two diameters of d = 0.5 (b) and d = 1.5 (c) correspond to the two networks of (a), with HGT edges 1 and 3, respectively; results for the third network are omitted due to space limitations. See text for more details.

Mentions: In our second set of experiments, we set out to investigate how ML performs in terms of identifying the location of a reticulation edge as well as the inheritance probability that indicates the fraction of genes (non-recombining regions) that were transferred across that edge. We considered three independent evolutionary scenarios, each involving a single reticulation edge of a certain diameter, as shown in Fig. 2(a). All three reticulation edges have the same height and agree on the donor node, yet differ in terms of recipient node, and consequently the diameter. Each of the three resulting networks contains exactly two trees: (1) Network N1, which is formed by adding only reticulation edge 1 to the underlying tree T; this network contains the two trees T and T1, where T1 differs from T only by placing taxon 2 as a sister taxon of 3; (2) Network N2, which is formed by adding only reticulation edge 2 to the underlying tree T; this network contains the two trees T and T2, where T2 differs from T only by placing taxon 4 as a sister taxon of 3; and, (3) Network N3, which is formed by adding only reticulation edge 3 to the underlying tree T; this network contains the two trees T and T3, where T3 differs from T only by placing taxon 7 as a sister taxon of 3.


Inference of reticulate evolutionary histories by maximum likelihood: the performance of information criteria.

Park HJ, Nakhleh L - BMC Bioinformatics (2012)

(a) Three evolutionary histories, each involving the same underlying tree (black lines) and a single reticulation edge from the set of three reticulation edges 1, 2, and 3. The diameters of the three reticulation edges 1, 2, 3 are 0.5, 1.0, and 1.5, respectively. (b,c) The performance of ML for estimating the inheritance probabilities on data simulated with a single reticulation event. The genome size corresponds to the number of gene data sets used in the inference. Each panel contains three segments, corresponding to three different values of true inheritance probabilities: 0.1, 0.3, and 0.5. The inheritance probabilities γe were estimated using Eq. (4). The two diameters of d = 0.5 (b) and d = 1.5 (c) correspond to the two networks of (a), with HGT edges 1 and 3, respectively; results for the third network are omitted due to space limitations. See text for more details.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3526433&req=5

Figure 2: (a) Three evolutionary histories, each involving the same underlying tree (black lines) and a single reticulation edge from the set of three reticulation edges 1, 2, and 3. The diameters of the three reticulation edges 1, 2, 3 are 0.5, 1.0, and 1.5, respectively. (b,c) The performance of ML for estimating the inheritance probabilities on data simulated with a single reticulation event. The genome size corresponds to the number of gene data sets used in the inference. Each panel contains three segments, corresponding to three different values of true inheritance probabilities: 0.1, 0.3, and 0.5. The inheritance probabilities γe were estimated using Eq. (4). The two diameters of d = 0.5 (b) and d = 1.5 (c) correspond to the two networks of (a), with HGT edges 1 and 3, respectively; results for the third network are omitted due to space limitations. See text for more details.
Mentions: In our second set of experiments, we set out to investigate how ML performs in terms of identifying the location of a reticulation edge as well as the inheritance probability that indicates the fraction of genes (non-recombining regions) that were transferred across that edge. We considered three independent evolutionary scenarios, each involving a single reticulation edge of a certain diameter, as shown in Fig. 2(a). All three reticulation edges have the same height and agree on the donor node, yet differ in terms of recipient node, and consequently the diameter. Each of the three resulting networks contains exactly two trees: (1) Network N1, which is formed by adding only reticulation edge 1 to the underlying tree T; this network contains the two trees T and T1, where T1 differs from T only by placing taxon 2 as a sister taxon of 3; (2) Network N2, which is formed by adding only reticulation edge 2 to the underlying tree T; this network contains the two trees T and T2, where T2 differs from T only by placing taxon 4 as a sister taxon of 3; and, (3) Network N3, which is formed by adding only reticulation edge 3 to the underlying tree T; this network contains the two trees T and T3, where T3 differs from T only by placing taxon 7 as a sister taxon of 3.

Bottom Line: We find both of them, particularly the diameter, have a significant effect.Our results demonstrate that BIC provides a good framework for inferring reticulate evolutionary histories.Nevertheless, the results call for caution when interpreting the accuracy of the inference particularly for data sets with particular evolutionary features.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA. hjpark@bcm.edu

ABSTRACT

Background: Maximum likelihood has been widely used for over three decades to infer phylogenetic trees from molecular data. When reticulate evolutionary events occur, several genomic regions may have conflicting evolutionary histories, and a phylogenetic network may provide a more adequate model for representing the evolutionary history of the genomes or species. A maximum likelihood (ML) model has been proposed for this case and accounts for both mutation within a genomic region and reticulation across the regions. However, the performance of this model in terms of inferring information about reticulate evolution and properties that affect this performance have not been studied.

Results: In this paper, we study the effect of the evolutionary diameter and height of a reticulation event on its identifiability under ML. We find both of them, particularly the diameter, have a significant effect. Further, we find that the number of genes (which can be generalized to the concept of "non-recombining genomic regions") that are transferred across a reticulation edge affects its detectability. Last but not least, a fundamental challenge with phylogenetic networks is that they allow an arbitrary level of complexity, giving rise to the model selection problem. We investigate the performance of two information criteria, the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC), for addressing this problem. We find that BIC performs well in general for controlling the model complexity and preventing ML from grossly overestimating the number of reticulation events.

Conclusion: Our results demonstrate that BIC provides a good framework for inferring reticulate evolutionary histories. Nevertheless, the results call for caution when interpreting the accuracy of the inference particularly for data sets with particular evolutionary features.

Show MeSH