Limits...
Gene trees and species trees: irreconcilable differences.

Swenson KM, El-Mabrouk N - BMC Bioinformatics (2012)

Bottom Line: This paper highlights the fact that errors in gene trees are not the only reasons for the inference of an erroneous duplication-loss history.More precisely, we prove that, under certain reasonable hypotheses based on the widely accepted link between function and sequence constraints, even a well-supported gene tree yield a reconciliation that does not correspond to the true history.We then provide the theoretical underpinnings for a conservative approach to infer histories given such gene trees.

View Article: PubMed Central - HTML - PubMed

Affiliation: Département d'Informatique, DIRO, Université de Montréal, H3C 3J7 Canada. swensonk@iro.umontreal.ca

ABSTRACT

Background: Reconciliation is the classical method for inferring a duplication and loss history from a set of extant genes. It is based upon the notion of embedding the gene tree into the species tree, the incongruence between the two indicating evidence for duplication and loss. However, results obtained by this method are highly dependent upon the considered species and gene trees. Thus, painstaking attention has been given to the development of methods for reconstructing accurate gene trees.

Results: This paper highlights the fact that errors in gene trees are not the only reasons for the inference of an erroneous duplication-loss history. More precisely, we prove that, under certain reasonable hypotheses based on the widely accepted link between function and sequence constraints, even a well-supported gene tree yield a reconciliation that does not correspond to the true history. We then provide the theoretical underpinnings for a conservative approach to infer histories given such gene trees. We apply our method to the mammalian interleukin-1 (IL) gene tree, that has been used as a model example to illustrate the role of reconciliation.

Show MeSH
S is a species tree for Σ = {1, 2, 3}; H represents a history, consistent with (i.e. embedded in) the species tree S, with one duplication event preceding the speciation event leading to genomes 2 and 3. Speciation events appear as bifurcations at obtuse angles, while duplication events appear at right angles. We represent the information on isorthology by positioning the retainer of parental function directly under the parental gene. Moreover, we label isorthologs with the same letter (all a's are pairwise isorthologous, and all b's are pairwise isorthologous); P is the phylogeny for the gene family Γ = {a1,a2,a3,b2,b3} corresponding to H; it is the same tree as H, embedded differently (uncross edges). G is the gene tree respecting the isolocalization property that is likely to be obtained for the gene family Γ that evolved according to H. Internal node labels of S,G, and P correspond to the LCA mapping, and squares mark duplication nodes, and circles mark speciations resulting from the mapping. R is the reconciliation corresponding to the mapping. The loss has a dotted line indicating the lost lineage.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3526438&req=5

Figure 1: S is a species tree for Σ = {1, 2, 3}; H represents a history, consistent with (i.e. embedded in) the species tree S, with one duplication event preceding the speciation event leading to genomes 2 and 3. Speciation events appear as bifurcations at obtuse angles, while duplication events appear at right angles. We represent the information on isorthology by positioning the retainer of parental function directly under the parental gene. Moreover, we label isorthologs with the same letter (all a's are pairwise isorthologous, and all b's are pairwise isorthologous); P is the phylogeny for the gene family Γ = {a1,a2,a3,b2,b3} corresponding to H; it is the same tree as H, embedded differently (uncross edges). G is the gene tree respecting the isolocalization property that is likely to be obtained for the gene family Γ that evolved according to H. Internal node labels of S,G, and P correspond to the LCA mapping, and squares mark duplication nodes, and circles mark speciations resulting from the mapping. R is the reconciliation corresponding to the mapping. The loss has a dotted line indicating the lost lineage.

Mentions: For our purposes, a genome is just a collection of genes. A phylogeny is a rooted binary tree, uniquely leaf-labeled by some set. A species tree S is a phylogeny over a set of species , which represents the evolutionary relationship between these species. Similarly, we can consider the evolutionary relationship between a family of genes Γ, that appear in the genomes of : a gene tree G for Γ is a phylogeny accompanied by a function g : indicating the species where each gene is found. It is reasonable to assume that there is at least one gene per species in S. In Figure 1, the tree S is the species tree for , and G and P are two possible gene trees for Γ = {a1, a2, a3, b2, b3}, where s(xi) = i for x ∈ {a, b}.


Gene trees and species trees: irreconcilable differences.

Swenson KM, El-Mabrouk N - BMC Bioinformatics (2012)

S is a species tree for Σ = {1, 2, 3}; H represents a history, consistent with (i.e. embedded in) the species tree S, with one duplication event preceding the speciation event leading to genomes 2 and 3. Speciation events appear as bifurcations at obtuse angles, while duplication events appear at right angles. We represent the information on isorthology by positioning the retainer of parental function directly under the parental gene. Moreover, we label isorthologs with the same letter (all a's are pairwise isorthologous, and all b's are pairwise isorthologous); P is the phylogeny for the gene family Γ = {a1,a2,a3,b2,b3} corresponding to H; it is the same tree as H, embedded differently (uncross edges). G is the gene tree respecting the isolocalization property that is likely to be obtained for the gene family Γ that evolved according to H. Internal node labels of S,G, and P correspond to the LCA mapping, and squares mark duplication nodes, and circles mark speciations resulting from the mapping. R is the reconciliation corresponding to the mapping. The loss has a dotted line indicating the lost lineage.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3526438&req=5

Figure 1: S is a species tree for Σ = {1, 2, 3}; H represents a history, consistent with (i.e. embedded in) the species tree S, with one duplication event preceding the speciation event leading to genomes 2 and 3. Speciation events appear as bifurcations at obtuse angles, while duplication events appear at right angles. We represent the information on isorthology by positioning the retainer of parental function directly under the parental gene. Moreover, we label isorthologs with the same letter (all a's are pairwise isorthologous, and all b's are pairwise isorthologous); P is the phylogeny for the gene family Γ = {a1,a2,a3,b2,b3} corresponding to H; it is the same tree as H, embedded differently (uncross edges). G is the gene tree respecting the isolocalization property that is likely to be obtained for the gene family Γ that evolved according to H. Internal node labels of S,G, and P correspond to the LCA mapping, and squares mark duplication nodes, and circles mark speciations resulting from the mapping. R is the reconciliation corresponding to the mapping. The loss has a dotted line indicating the lost lineage.
Mentions: For our purposes, a genome is just a collection of genes. A phylogeny is a rooted binary tree, uniquely leaf-labeled by some set. A species tree S is a phylogeny over a set of species , which represents the evolutionary relationship between these species. Similarly, we can consider the evolutionary relationship between a family of genes Γ, that appear in the genomes of : a gene tree G for Γ is a phylogeny accompanied by a function g : indicating the species where each gene is found. It is reasonable to assume that there is at least one gene per species in S. In Figure 1, the tree S is the species tree for , and G and P are two possible gene trees for Γ = {a1, a2, a3, b2, b3}, where s(xi) = i for x ∈ {a, b}.

Bottom Line: This paper highlights the fact that errors in gene trees are not the only reasons for the inference of an erroneous duplication-loss history.More precisely, we prove that, under certain reasonable hypotheses based on the widely accepted link between function and sequence constraints, even a well-supported gene tree yield a reconciliation that does not correspond to the true history.We then provide the theoretical underpinnings for a conservative approach to infer histories given such gene trees.

View Article: PubMed Central - HTML - PubMed

Affiliation: Département d'Informatique, DIRO, Université de Montréal, H3C 3J7 Canada. swensonk@iro.umontreal.ca

ABSTRACT

Background: Reconciliation is the classical method for inferring a duplication and loss history from a set of extant genes. It is based upon the notion of embedding the gene tree into the species tree, the incongruence between the two indicating evidence for duplication and loss. However, results obtained by this method are highly dependent upon the considered species and gene trees. Thus, painstaking attention has been given to the development of methods for reconstructing accurate gene trees.

Results: This paper highlights the fact that errors in gene trees are not the only reasons for the inference of an erroneous duplication-loss history. More precisely, we prove that, under certain reasonable hypotheses based on the widely accepted link between function and sequence constraints, even a well-supported gene tree yield a reconciliation that does not correspond to the true history. We then provide the theoretical underpinnings for a conservative approach to infer histories given such gene trees. We apply our method to the mammalian interleukin-1 (IL) gene tree, that has been used as a model example to illustrate the role of reconciliation.

Show MeSH