Limits...
Reconstructible phylogenetic networks: do not distinguish the indistinguishable.

Pardi F, Scornavacca C - PLoS Comput. Biol. (2015)

Bottom Line: This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem.For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set.While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.

View Article: PubMed Central - PubMed

Affiliation: Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM, UMR 5506) CNRS, Université de Montpellier, France; Institut de Biologie Computationnelle, Montpellier, France.

ABSTRACT
Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference methods such networks are "indistinguishable". This is true for all methods that evaluate a phylogenetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisations of maximum parsimony and maximum likelihood for networks. This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only attempt to reconstruct what they can uniquely identify. To this end, we introduce a novel definition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.

No MeSH data available.


Comparison between the reduced version and the canonical form of a network.N1 is the network topology in Fig. 15a of [46], where edges leading to extinct taxa are shown in grey, and reticulation events are represented by horizontal lines connecting the involved edges. N2 is a phylogenetic network on the same set of taxa displaying the same evolutionary history, and showing edge lengths. R(N1) is the reduced version of N1 (Fig. 15b of [46]).  is the canonical form of N2. Comparing R(N1) and  reveals the difference in expressive power between reduced versions and canonical forms. Collapsing the edge above c and d in R(N1) yields the regular network displaying the same tree topologies as N1 and N2. Clearly, the reduced form R(N1) (and the regular form) retain less of the complexity of the original network N1 than the canonical form . For example in R(N1) there remains no sign of the reticulate events ancestral to taxon e.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4388854&req=5

pcbi.1004135.g006: Comparison between the reduced version and the canonical form of a network.N1 is the network topology in Fig. 15a of [46], where edges leading to extinct taxa are shown in grey, and reticulation events are represented by horizontal lines connecting the involved edges. N2 is a phylogenetic network on the same set of taxa displaying the same evolutionary history, and showing edge lengths. R(N1) is the reduced version of N1 (Fig. 15b of [46]). is the canonical form of N2. Comparing R(N1) and reveals the difference in expressive power between reduced versions and canonical forms. Collapsing the edge above c and d in R(N1) yields the regular network displaying the same tree topologies as N1 and N2. Clearly, the reduced form R(N1) (and the regular form) retain less of the complexity of the original network N1 than the canonical form . For example in R(N1) there remains no sign of the reticulate events ancestral to taxon e.

Mentions: Moret et al. [46] defined notions of reconstructible, indistinguishable and reduced networks that resemble concepts that we will introduce here. Although some of their results were flawed [47, 50], some of the arguments in this introduction are inspired by their paper. Particularly relevant to the current paper is a reduction algorithm to transform a network into its reduced version. (However, the exact definition of the reduced version is unclear: as one of the authors later pointed out [47], “the reduction procedure of Moret et al. [46] is, in fact, inaccurate” and “in this paper we do not attempt to fix the procedure”.) The concept of reduced version is analogous to that of canonical form here, as the authors claim that networks displaying the same tree topologies have the same reduced version (up to isomorphism; Theorem 2 in [46]). This is somehow a weaker analogue of one of our results (Corollary 1); weaker, because it does not claim that, conversely, networks with the same reduced version display the same tree topologies. To have an idea of the difference between our canonical form and the reduced version of Moret and colleagues, in Fig. 6 we compare the canonical form and the reduced version of the same network N1. (N1 and its reduced version are taken from Fig. 15 of [46] to avoid possible issues with the reduction algorithm.) As one can see, the canonical form retains more of the complexity of the original network.


Reconstructible phylogenetic networks: do not distinguish the indistinguishable.

Pardi F, Scornavacca C - PLoS Comput. Biol. (2015)

Comparison between the reduced version and the canonical form of a network.N1 is the network topology in Fig. 15a of [46], where edges leading to extinct taxa are shown in grey, and reticulation events are represented by horizontal lines connecting the involved edges. N2 is a phylogenetic network on the same set of taxa displaying the same evolutionary history, and showing edge lengths. R(N1) is the reduced version of N1 (Fig. 15b of [46]).  is the canonical form of N2. Comparing R(N1) and  reveals the difference in expressive power between reduced versions and canonical forms. Collapsing the edge above c and d in R(N1) yields the regular network displaying the same tree topologies as N1 and N2. Clearly, the reduced form R(N1) (and the regular form) retain less of the complexity of the original network N1 than the canonical form . For example in R(N1) there remains no sign of the reticulate events ancestral to taxon e.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4388854&req=5

pcbi.1004135.g006: Comparison between the reduced version and the canonical form of a network.N1 is the network topology in Fig. 15a of [46], where edges leading to extinct taxa are shown in grey, and reticulation events are represented by horizontal lines connecting the involved edges. N2 is a phylogenetic network on the same set of taxa displaying the same evolutionary history, and showing edge lengths. R(N1) is the reduced version of N1 (Fig. 15b of [46]). is the canonical form of N2. Comparing R(N1) and reveals the difference in expressive power between reduced versions and canonical forms. Collapsing the edge above c and d in R(N1) yields the regular network displaying the same tree topologies as N1 and N2. Clearly, the reduced form R(N1) (and the regular form) retain less of the complexity of the original network N1 than the canonical form . For example in R(N1) there remains no sign of the reticulate events ancestral to taxon e.
Mentions: Moret et al. [46] defined notions of reconstructible, indistinguishable and reduced networks that resemble concepts that we will introduce here. Although some of their results were flawed [47, 50], some of the arguments in this introduction are inspired by their paper. Particularly relevant to the current paper is a reduction algorithm to transform a network into its reduced version. (However, the exact definition of the reduced version is unclear: as one of the authors later pointed out [47], “the reduction procedure of Moret et al. [46] is, in fact, inaccurate” and “in this paper we do not attempt to fix the procedure”.) The concept of reduced version is analogous to that of canonical form here, as the authors claim that networks displaying the same tree topologies have the same reduced version (up to isomorphism; Theorem 2 in [46]). This is somehow a weaker analogue of one of our results (Corollary 1); weaker, because it does not claim that, conversely, networks with the same reduced version display the same tree topologies. To have an idea of the difference between our canonical form and the reduced version of Moret and colleagues, in Fig. 6 we compare the canonical form and the reduced version of the same network N1. (N1 and its reduced version are taken from Fig. 15 of [46] to avoid possible issues with the reduction algorithm.) As one can see, the canonical form retains more of the complexity of the original network.

Bottom Line: This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem.For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set.While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.

View Article: PubMed Central - PubMed

Affiliation: Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM, UMR 5506) CNRS, Université de Montpellier, France; Institut de Biologie Computationnelle, Montpellier, France.

ABSTRACT
Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference methods such networks are "indistinguishable". This is true for all methods that evaluate a phylogenetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisations of maximum parsimony and maximum likelihood for networks. This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only attempt to reconstruct what they can uniquely identify. To this end, we introduce a novel definition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.

No MeSH data available.