Limits...
Reconstructible phylogenetic networks: do not distinguish the indistinguishable.

Pardi F, Scornavacca C - PLoS Comput. Biol. (2015)

Bottom Line: This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem.For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set.While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.

View Article: PubMed Central - PubMed

Affiliation: Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM, UMR 5506) CNRS, Université de Montpellier, France; Institut de Biologie Computationnelle, Montpellier, France.

ABSTRACT
Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference methods such networks are "indistinguishable". This is true for all methods that evaluate a phylogenetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisations of maximum parsimony and maximum likelihood for networks. This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only attempt to reconstruct what they can uniquely identify. To this end, we introduce a novel definition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.

No MeSH data available.


Related in: MedlinePlus

Examples of networks that can be uniquely recovered from the data they generate, despite being excluded by many proposed definitions of reconstructible networks.(a) A network N, and its canonical form N′, whose topologies are not galled trees, nor galled, tree-child, tree-sibling or regular networks, nor networks with visible reticulations. N, however, is uniquely determined by the trees it displays, with the exception of x, y and z, which can assume any value between 0 and 0.1. Because of the impossibility to determine these values, the canonical form N′ has the corresponding edges collapsed. As N′ is a network in canonical form satisfying the mild conditions of Corollary 2, N′ is uniquely determined by the trees it displays. Note that N provides the biological interpretation for N′. (b) The network topology of N′′ is such that there exists no regular network displaying the same set of (two) tree topologies as N′′. Thus, restricting the scope of phylogenetic inference to regular networks would be very limiting. In our framework, N′′ is a network in canonical form and thus uniquely determined by the trees it displays.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4388854&req=5

pcbi.1004135.g005: Examples of networks that can be uniquely recovered from the data they generate, despite being excluded by many proposed definitions of reconstructible networks.(a) A network N, and its canonical form N′, whose topologies are not galled trees, nor galled, tree-child, tree-sibling or regular networks, nor networks with visible reticulations. N, however, is uniquely determined by the trees it displays, with the exception of x, y and z, which can assume any value between 0 and 0.1. Because of the impossibility to determine these values, the canonical form N′ has the corresponding edges collapsed. As N′ is a network in canonical form satisfying the mild conditions of Corollary 2, N′ is uniquely determined by the trees it displays. Note that N provides the biological interpretation for N′. (b) The network topology of N′′ is such that there exists no regular network displaying the same set of (two) tree topologies as N′′. Thus, restricting the scope of phylogenetic inference to regular networks would be very limiting. In our framework, N′′ is a network in canonical form and thus uniquely determined by the trees it displays.

Mentions: Our goals are more basic: starting from the observation that not all phylogenetic networks are identifiable, since many of them are mutually indistinguishable with most inference approaches, we aim to define a class of networks that is (existence goal) large enough that every phylogenetic network has an equivalent (i.e. indistinguishable) network within this class and (distinguishability goal) small enough that no two networks within this class are indistinguishable. From our standpoint, the computationally-motivated definitions above are at the same time too broad and too restrictive. Too broad, because they determine a set of networks that includes many pairs of indistinguishable networks: for example the three indistinguishable networks in Fig. 2 are all galled trees—and thus belong to every single one of the classes mentioned above (which are all generalizations of galled trees). Too restrictive, because these classes of networks do not include simple networks that it should be possible to reconstruct from real data. For example, Fig. 5a shows a network N with edge lengths that is not tree-sibling, nor has the visible property, and thus is not galled, nor tree-child (for definitions, see [1]), but which in practice should be reconstructible: apart from the lengths of three edges (x, y, z), N is uniquely determined by the trees that it displays (a consequence of the formal results that we will show in the following), meaning that, given large amounts of data strongly supporting each of these (seven) trees with their correct edge lengths, any method for network inference properly accounting for edge lengths (e.g. based on ML) should be able to reconstruct N, or its canonical form N′.


Reconstructible phylogenetic networks: do not distinguish the indistinguishable.

Pardi F, Scornavacca C - PLoS Comput. Biol. (2015)

Examples of networks that can be uniquely recovered from the data they generate, despite being excluded by many proposed definitions of reconstructible networks.(a) A network N, and its canonical form N′, whose topologies are not galled trees, nor galled, tree-child, tree-sibling or regular networks, nor networks with visible reticulations. N, however, is uniquely determined by the trees it displays, with the exception of x, y and z, which can assume any value between 0 and 0.1. Because of the impossibility to determine these values, the canonical form N′ has the corresponding edges collapsed. As N′ is a network in canonical form satisfying the mild conditions of Corollary 2, N′ is uniquely determined by the trees it displays. Note that N provides the biological interpretation for N′. (b) The network topology of N′′ is such that there exists no regular network displaying the same set of (two) tree topologies as N′′. Thus, restricting the scope of phylogenetic inference to regular networks would be very limiting. In our framework, N′′ is a network in canonical form and thus uniquely determined by the trees it displays.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4388854&req=5

pcbi.1004135.g005: Examples of networks that can be uniquely recovered from the data they generate, despite being excluded by many proposed definitions of reconstructible networks.(a) A network N, and its canonical form N′, whose topologies are not galled trees, nor galled, tree-child, tree-sibling or regular networks, nor networks with visible reticulations. N, however, is uniquely determined by the trees it displays, with the exception of x, y and z, which can assume any value between 0 and 0.1. Because of the impossibility to determine these values, the canonical form N′ has the corresponding edges collapsed. As N′ is a network in canonical form satisfying the mild conditions of Corollary 2, N′ is uniquely determined by the trees it displays. Note that N provides the biological interpretation for N′. (b) The network topology of N′′ is such that there exists no regular network displaying the same set of (two) tree topologies as N′′. Thus, restricting the scope of phylogenetic inference to regular networks would be very limiting. In our framework, N′′ is a network in canonical form and thus uniquely determined by the trees it displays.
Mentions: Our goals are more basic: starting from the observation that not all phylogenetic networks are identifiable, since many of them are mutually indistinguishable with most inference approaches, we aim to define a class of networks that is (existence goal) large enough that every phylogenetic network has an equivalent (i.e. indistinguishable) network within this class and (distinguishability goal) small enough that no two networks within this class are indistinguishable. From our standpoint, the computationally-motivated definitions above are at the same time too broad and too restrictive. Too broad, because they determine a set of networks that includes many pairs of indistinguishable networks: for example the three indistinguishable networks in Fig. 2 are all galled trees—and thus belong to every single one of the classes mentioned above (which are all generalizations of galled trees). Too restrictive, because these classes of networks do not include simple networks that it should be possible to reconstruct from real data. For example, Fig. 5a shows a network N with edge lengths that is not tree-sibling, nor has the visible property, and thus is not galled, nor tree-child (for definitions, see [1]), but which in practice should be reconstructible: apart from the lengths of three edges (x, y, z), N is uniquely determined by the trees that it displays (a consequence of the formal results that we will show in the following), meaning that, given large amounts of data strongly supporting each of these (seven) trees with their correct edge lengths, any method for network inference properly accounting for edge lengths (e.g. based on ML) should be able to reconstruct N, or its canonical form N′.

Bottom Line: This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem.For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set.While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.

View Article: PubMed Central - PubMed

Affiliation: Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM, UMR 5506) CNRS, Université de Montpellier, France; Institut de Biologie Computationnelle, Montpellier, France.

ABSTRACT
Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference methods such networks are "indistinguishable". This is true for all methods that evaluate a phylogenetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisations of maximum parsimony and maximum likelihood for networks. This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only attempt to reconstruct what they can uniquely identify. To this end, we introduce a novel definition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.

No MeSH data available.


Related in: MedlinePlus