Limits...
Joint amalgamation of most parsimonious reconciled gene trees.

Scornavacca C, Jacox E, Szöllősi GJ - Bioinformatics (2014)

Bottom Line: Traditionally, gene phylogenies have been reconstructed solely on the basis of molecular sequences; this, however, often does not provide enough information to distinguish between statistically equivalent relationships.Using a large scale simulated dataset, we demonstrate that TERA achieves the same accuracy as the corresponding probabilistic method while being faster, and outperforms other parsimony-based methods in both accuracy and speed.Running TERA on a set of 1099 homologous gene families from complete cyanobacterial genomes, we find that incorporating knowledge of the species tree results in a two thirds reduction in the number of apparent transfer events.

View Article: PubMed Central - PubMed

Affiliation: ISEM, UM2-CNRS-IRD, Place Eugène Bataillon 34095 Montpellier, France, Institut de Biologie Computationnelle (IBC), 95 rue de la Galéra, 34095 Montpellier, France and ELTE-MTA 'Lendület' Biophysics Research Group 1117 Bp., Pázmány P. stny. 1A., Budapest, Hungary ISEM, UM2-CNRS-IRD, Place Eugène Bataillon 34095 Montpellier, France, Institut de Biologie Computationnelle (IBC), 95 rue de la Galéra, 34095 Montpellier, France and ELTE-MTA 'Lendület' Biophysics Research Group 1117 Bp., Pázmány P. stny. 1A., Budapest, Hungary.

Show MeSH
CCPs can be used to estimate the posterior probability of any tree that can be amalgamated from clades present in a sample of gene trees (David and Alm, 2010; Höhna and Drummond, 2012). Conditional clade frequencies can be used to approximate CCPs and are computed as the proportion of occurrences of a particular split of a clade according to a tripartition π, e.g. (abcde) among all trees in which the clade, e.g. (abcde), is found. Estimates based on the sample of trees on the left are shown as fractions for two different gene trees that can be amalgamated. The estimate for a gene tree is given by the sum of the reconciliation score and the logarithm of the tree CCPs. Based on the sample on the left, the tree with the highest posterior probability is the third tree (blue online). Reconciling it with the species tree requires one transfer and one loss event. It is, however, possible to combine clades present in the second (green online) and third (blue online) trees to produce a gene tree that is not present in the original sample but is identical to the species tree, i.e. it requires no events to draw it into the species tree. Depending on the costs of transfer and loss events, and the self-consistently estimated cA parameter, the scenario without transfer might be optimal for the joint score
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4380024&req=5

btu728-F1: CCPs can be used to estimate the posterior probability of any tree that can be amalgamated from clades present in a sample of gene trees (David and Alm, 2010; Höhna and Drummond, 2012). Conditional clade frequencies can be used to approximate CCPs and are computed as the proportion of occurrences of a particular split of a clade according to a tripartition π, e.g. (abcde) among all trees in which the clade, e.g. (abcde), is found. Estimates based on the sample of trees on the left are shown as fractions for two different gene trees that can be amalgamated. The estimate for a gene tree is given by the sum of the reconciliation score and the logarithm of the tree CCPs. Based on the sample on the left, the tree with the highest posterior probability is the third tree (blue online). Reconciling it with the species tree requires one transfer and one loss event. It is, however, possible to combine clades present in the second (green online) and third (blue online) trees to produce a gene tree that is not present in the original sample but is identical to the species tree, i.e. it requires no events to draw it into the species tree. Depending on the costs of transfer and loss events, and the self-consistently estimated cA parameter, the scenario without transfer might be optimal for the joint score

Mentions: To circumvent this problem, David and Alm (2010) introduced the amalgamation algorithm, described in detail in Section 2.3 below and illustrated in Figure 1. Furthermore, Szöllősi et al. (2013b) recently developed an approach to exhaustively explore all reconciled gene trees that can be amalgamated from a sample of gene trees, i.e. obtainable by combining clades observed in the sample. Additionally, their method—ALE, for Amalgamated Likelihood Estimation—combines the amalgamation algorithm of David and Alm (2010) with conditional clade probabilities (CCPs) introduced by Höhna and Drummond (2012) and reconstructs the gene phylogenies by optimizing a joint sequence-reconciliation likelihood score, resulting in gene trees that are dramatically more accurate than those reconstructed using molecular sequences alone.Fig. 1.


Joint amalgamation of most parsimonious reconciled gene trees.

Scornavacca C, Jacox E, Szöllősi GJ - Bioinformatics (2014)

CCPs can be used to estimate the posterior probability of any tree that can be amalgamated from clades present in a sample of gene trees (David and Alm, 2010; Höhna and Drummond, 2012). Conditional clade frequencies can be used to approximate CCPs and are computed as the proportion of occurrences of a particular split of a clade according to a tripartition π, e.g. (abcde) among all trees in which the clade, e.g. (abcde), is found. Estimates based on the sample of trees on the left are shown as fractions for two different gene trees that can be amalgamated. The estimate for a gene tree is given by the sum of the reconciliation score and the logarithm of the tree CCPs. Based on the sample on the left, the tree with the highest posterior probability is the third tree (blue online). Reconciling it with the species tree requires one transfer and one loss event. It is, however, possible to combine clades present in the second (green online) and third (blue online) trees to produce a gene tree that is not present in the original sample but is identical to the species tree, i.e. it requires no events to draw it into the species tree. Depending on the costs of transfer and loss events, and the self-consistently estimated cA parameter, the scenario without transfer might be optimal for the joint score
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4380024&req=5

btu728-F1: CCPs can be used to estimate the posterior probability of any tree that can be amalgamated from clades present in a sample of gene trees (David and Alm, 2010; Höhna and Drummond, 2012). Conditional clade frequencies can be used to approximate CCPs and are computed as the proportion of occurrences of a particular split of a clade according to a tripartition π, e.g. (abcde) among all trees in which the clade, e.g. (abcde), is found. Estimates based on the sample of trees on the left are shown as fractions for two different gene trees that can be amalgamated. The estimate for a gene tree is given by the sum of the reconciliation score and the logarithm of the tree CCPs. Based on the sample on the left, the tree with the highest posterior probability is the third tree (blue online). Reconciling it with the species tree requires one transfer and one loss event. It is, however, possible to combine clades present in the second (green online) and third (blue online) trees to produce a gene tree that is not present in the original sample but is identical to the species tree, i.e. it requires no events to draw it into the species tree. Depending on the costs of transfer and loss events, and the self-consistently estimated cA parameter, the scenario without transfer might be optimal for the joint score
Mentions: To circumvent this problem, David and Alm (2010) introduced the amalgamation algorithm, described in detail in Section 2.3 below and illustrated in Figure 1. Furthermore, Szöllősi et al. (2013b) recently developed an approach to exhaustively explore all reconciled gene trees that can be amalgamated from a sample of gene trees, i.e. obtainable by combining clades observed in the sample. Additionally, their method—ALE, for Amalgamated Likelihood Estimation—combines the amalgamation algorithm of David and Alm (2010) with conditional clade probabilities (CCPs) introduced by Höhna and Drummond (2012) and reconstructs the gene phylogenies by optimizing a joint sequence-reconciliation likelihood score, resulting in gene trees that are dramatically more accurate than those reconstructed using molecular sequences alone.Fig. 1.

Bottom Line: Traditionally, gene phylogenies have been reconstructed solely on the basis of molecular sequences; this, however, often does not provide enough information to distinguish between statistically equivalent relationships.Using a large scale simulated dataset, we demonstrate that TERA achieves the same accuracy as the corresponding probabilistic method while being faster, and outperforms other parsimony-based methods in both accuracy and speed.Running TERA on a set of 1099 homologous gene families from complete cyanobacterial genomes, we find that incorporating knowledge of the species tree results in a two thirds reduction in the number of apparent transfer events.

View Article: PubMed Central - PubMed

Affiliation: ISEM, UM2-CNRS-IRD, Place Eugène Bataillon 34095 Montpellier, France, Institut de Biologie Computationnelle (IBC), 95 rue de la Galéra, 34095 Montpellier, France and ELTE-MTA 'Lendület' Biophysics Research Group 1117 Bp., Pázmány P. stny. 1A., Budapest, Hungary ISEM, UM2-CNRS-IRD, Place Eugène Bataillon 34095 Montpellier, France, Institut de Biologie Computationnelle (IBC), 95 rue de la Galéra, 34095 Montpellier, France and ELTE-MTA 'Lendület' Biophysics Research Group 1117 Bp., Pázmány P. stny. 1A., Budapest, Hungary.

Show MeSH