Limits...
Reconstructible phylogenetic networks: do not distinguish the indistinguishable.

Pardi F, Scornavacca C - PLoS Comput. Biol. (2015)

Bottom Line: This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem.For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set.While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.

View Article: PubMed Central - PubMed

Affiliation: Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM, UMR 5506) CNRS, Université de Montpellier, France; Institut de Biologie Computationnelle, Montpellier, France.

ABSTRACT
Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference methods such networks are "indistinguishable". This is true for all methods that evaluate a phylogenetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisations of maximum parsimony and maximum likelihood for networks. This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only attempt to reconstruct what they can uniquely identify. To this end, we introduce a novel definition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.

No MeSH data available.


Canonical form of N1 and N2 in Fig. 3.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4388854&req=5

pcbi.1004135.g004: Canonical form of N1 and N2 in Fig. 3.

Mentions: In fact it is not difficult to construct other examples of indistinguishable networks: each time a network has a reticulation v giving birth to only one edge (i.e. with outdegree 1), then we can reduce by Δλ the length of this edge and correspondingly increase by Δλ the lengths of the edges ending in v, without altering the set of trees displayed by the network. Note that this operation, which we refer to as “unzipping” reticulation v, can result in v coinciding with a speciation node or a leaf when Δλ is taken to equal the length of the edge going out of v. For example in Fig. 3, one may fully unzip the two reticulation nodes in N1, thus obtaining the network N′ of Fig. 4. As expected, N1 and N′ display the same set of trees ({T1, T2, T3}) and are thus indistinguishable. What is most interesting in this example is that, if we fully unzip the two reticulations in N2 (the other network in Fig. 3, also displaying {T1, T2, T3}), then we eventually end up obtaining N′ again. As we shall see in the following, this is not a coincidence: the unzipping transformations described above lead to what we call the canonical form of a network; under mild assumptions, two networks are indistinguishable if and only if they have the same canonical form (e.g. N1, N2 in Fig. 3 have the same canonical form N′; formal definitions and statements in the Results section).


Reconstructible phylogenetic networks: do not distinguish the indistinguishable.

Pardi F, Scornavacca C - PLoS Comput. Biol. (2015)

Canonical form of N1 and N2 in Fig. 3.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4388854&req=5

pcbi.1004135.g004: Canonical form of N1 and N2 in Fig. 3.
Mentions: In fact it is not difficult to construct other examples of indistinguishable networks: each time a network has a reticulation v giving birth to only one edge (i.e. with outdegree 1), then we can reduce by Δλ the length of this edge and correspondingly increase by Δλ the lengths of the edges ending in v, without altering the set of trees displayed by the network. Note that this operation, which we refer to as “unzipping” reticulation v, can result in v coinciding with a speciation node or a leaf when Δλ is taken to equal the length of the edge going out of v. For example in Fig. 3, one may fully unzip the two reticulation nodes in N1, thus obtaining the network N′ of Fig. 4. As expected, N1 and N′ display the same set of trees ({T1, T2, T3}) and are thus indistinguishable. What is most interesting in this example is that, if we fully unzip the two reticulations in N2 (the other network in Fig. 3, also displaying {T1, T2, T3}), then we eventually end up obtaining N′ again. As we shall see in the following, this is not a coincidence: the unzipping transformations described above lead to what we call the canonical form of a network; under mild assumptions, two networks are indistinguishable if and only if they have the same canonical form (e.g. N1, N2 in Fig. 3 have the same canonical form N′; formal definitions and statements in the Results section).

Bottom Line: This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem.For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set.While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.

View Article: PubMed Central - PubMed

Affiliation: Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM, UMR 5506) CNRS, Université de Montpellier, France; Institut de Biologie Computationnelle, Montpellier, France.

ABSTRACT
Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference methods such networks are "indistinguishable". This is true for all methods that evaluate a phylogenetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisations of maximum parsimony and maximum likelihood for networks. This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only attempt to reconstruct what they can uniquely identify. To this end, we introduce a novel definition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.

No MeSH data available.