Limits...
The inference of gene trees with species trees.

Szöllősi GJ, Tannier E, Daubin V, Boussau B - Syst. Biol. (2014)

Bottom Line: Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known.In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences.We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.

View Article: PubMed Central - PubMed

Affiliation: ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France;

Show MeSH
Gene tree–species tree models in the context of the phylogenomics inference pipeline. Left: the inference pipeline (some steps are not represented, such as sequencing error correction). Right: graphical representation of the inferential problem for a selection of the models and associated phylogenetic software discussed in the main text. The sequence of steps in the graphical model representations correspond to the hierarchical sequence of evolutionary process generating genomic sequences (cf. Fig. 1). The likelihood that must be computed is also shown. Graphical model conventions are observed: stochastic nodes, nodes corresponding to data considered as known are gray, and nodes whose states are inferred are in white. The models have been simplified, and parameters others than the gene tree and the species tree have not been represented.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4265139&req=5

Figure 5: Gene tree–species tree models in the context of the phylogenomics inference pipeline. Left: the inference pipeline (some steps are not represented, such as sequencing error correction). Right: graphical representation of the inferential problem for a selection of the models and associated phylogenetic software discussed in the main text. The sequence of steps in the graphical model representations correspond to the hierarchical sequence of evolutionary process generating genomic sequences (cf. Fig. 1). The likelihood that must be computed is also shown. Graphical model conventions are observed: stochastic nodes, nodes corresponding to data considered as known are gray, and nodes whose states are inferred are in white. The models have been simplified, and parameters others than the gene tree and the species tree have not been represented.

Mentions: One can see a phylogenetic pipeline as a series of statistical inferences, starting from raw sequences coming out of sequencing machines, and finishing with the inference of a species tree (Fig. 5). Necessary steps include sequencing error correction, assembly of reads into contigs and scaffolds, gene annotation, gene family clustering, alignment, and tree reconstruction. Most of these steps are done sequentially, so that later steps in the pipeline entirely disregard any estimate of uncertainty from the previous steps, and do not provide any feedback to these. Gene tree–species tree models take a step toward a more principled approach, by allowing communication between two steps of this pipeline, the construction of gene trees, and the construction of a species tree. Figure 5 places the above discussed models and associated phylogenetic software in the context of the complete phylogenetic inference pipeline. Gray nodes are considered known, and white nodes are inferred. This figure shows that a large diversity of inferential problems have been addressed, considering gene alignments, gene trees, species trees, or several of these as data. In this section, we review some of the methods and algorithms that have been used to address these inferential problems. We do not discuss methods aiming at reconstructing an alignment, and instead focus on gene tree–species tree methods. As a consequence, in the following we use “probability of an alignment” loosely to describe the probability coming from events of substitutions or jointly from events of substitutions and insertion–deletions. We present how data can be simulated, how the likelihood of a gene tree or of a species tree can be computed efficiently, and how good gene trees and species trees can be searched for.


The inference of gene trees with species trees.

Szöllősi GJ, Tannier E, Daubin V, Boussau B - Syst. Biol. (2014)

Gene tree–species tree models in the context of the phylogenomics inference pipeline. Left: the inference pipeline (some steps are not represented, such as sequencing error correction). Right: graphical representation of the inferential problem for a selection of the models and associated phylogenetic software discussed in the main text. The sequence of steps in the graphical model representations correspond to the hierarchical sequence of evolutionary process generating genomic sequences (cf. Fig. 1). The likelihood that must be computed is also shown. Graphical model conventions are observed: stochastic nodes, nodes corresponding to data considered as known are gray, and nodes whose states are inferred are in white. The models have been simplified, and parameters others than the gene tree and the species tree have not been represented.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4265139&req=5

Figure 5: Gene tree–species tree models in the context of the phylogenomics inference pipeline. Left: the inference pipeline (some steps are not represented, such as sequencing error correction). Right: graphical representation of the inferential problem for a selection of the models and associated phylogenetic software discussed in the main text. The sequence of steps in the graphical model representations correspond to the hierarchical sequence of evolutionary process generating genomic sequences (cf. Fig. 1). The likelihood that must be computed is also shown. Graphical model conventions are observed: stochastic nodes, nodes corresponding to data considered as known are gray, and nodes whose states are inferred are in white. The models have been simplified, and parameters others than the gene tree and the species tree have not been represented.
Mentions: One can see a phylogenetic pipeline as a series of statistical inferences, starting from raw sequences coming out of sequencing machines, and finishing with the inference of a species tree (Fig. 5). Necessary steps include sequencing error correction, assembly of reads into contigs and scaffolds, gene annotation, gene family clustering, alignment, and tree reconstruction. Most of these steps are done sequentially, so that later steps in the pipeline entirely disregard any estimate of uncertainty from the previous steps, and do not provide any feedback to these. Gene tree–species tree models take a step toward a more principled approach, by allowing communication between two steps of this pipeline, the construction of gene trees, and the construction of a species tree. Figure 5 places the above discussed models and associated phylogenetic software in the context of the complete phylogenetic inference pipeline. Gray nodes are considered known, and white nodes are inferred. This figure shows that a large diversity of inferential problems have been addressed, considering gene alignments, gene trees, species trees, or several of these as data. In this section, we review some of the methods and algorithms that have been used to address these inferential problems. We do not discuss methods aiming at reconstructing an alignment, and instead focus on gene tree–species tree methods. As a consequence, in the following we use “probability of an alignment” loosely to describe the probability coming from events of substitutions or jointly from events of substitutions and insertion–deletions. We present how data can be simulated, how the likelihood of a gene tree or of a species tree can be computed efficiently, and how good gene trees and species trees can be searched for.

Bottom Line: Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known.In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences.We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.

View Article: PubMed Central - PubMed

Affiliation: ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France;

Show MeSH