Limits...
A flexible ancestral genome reconstruction method based on gapped adjacencies.

Gagnon Y, Blanchette M, El-Mabrouk N - BMC Bioinformatics (2012)

Bottom Line: The "small phylogeny" problem consists in inferring ancestral genomes associated with each internal node of a phylogenetic tree of a set of extant species.Ancestral relationships between markers are defined in term of Gapped Adjacencies, i.e. pairs of markers separated by up to a given number of markers.Applying our algorithm on various simulated data sets reveals good performance as we usually end up with a completely assembled genome, while keeping a low error rate.

View Article: PubMed Central - HTML - PubMed

Affiliation: Département d'Informatique, DIRO, Université de Montréal, Canada.

ABSTRACT

Background: The "small phylogeny" problem consists in inferring ancestral genomes associated with each internal node of a phylogenetic tree of a set of extant species. Existing methods can be grouped into two main categories: the distance-based methods aiming at minimizing a total branch length, and the synteny-based (or mapping) methods that first predict a collection of relations between ancestral markers in term of "synteny", and then assemble this collection into a set of Contiguous Ancestral Regions (CARs). The predicted CARs are likely to be more reliable as they are more directly deduced from observed conservations in extant species. However the challenge is to end up with a completely assembled genome.

Results: We develop a new synteny-based method that is flexible enough to handle a model of evolution involving whole genome duplication events, in addition to rearrangements, gene insertions, and losses. Ancestral relationships between markers are defined in term of Gapped Adjacencies, i.e. pairs of markers separated by up to a given number of markers. It improves on a previous restricted to direct adjacencies, which revealed a high accuracy for adjacency prediction, but with the drawback of being overly conservative, i.e. of generating a large number of CARs. Applying our algorithm on various simulated data sets reveals good performance as we usually end up with a completely assembled genome, while keeping a low error rate.

Availability: All source code is available at http://www.iro.umontreal.ca/~mabrouk.

Show MeSH

Related in: MedlinePlus

(A) Evolution of the 11 yeast species recorded in the Yeast Gene Order Browser, as given by [29]. The * indicates partially sequenced organisms. At leaves, the top number is the number of chromosomes, contigs or scaffolds. The bottom number is the number of genes, as reported in [18]. On each branch, the label is the number of gene losses, which is directly inferred from the gene content at leaves. The simple circle is the root of the monophyletic group of non-duplicated species, referred in the text by σ. (B) The phylogenetic tree for Oryza sativa (rice), Brachypodium distachyon (brachypodium) and Sorghum bicolor (sorghum). At leaves, the top number is the number of chromosomes. The bottom number is the number of markers used in the study of cereal genomes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3526437&req=5

Figure 4: (A) Evolution of the 11 yeast species recorded in the Yeast Gene Order Browser, as given by [29]. The * indicates partially sequenced organisms. At leaves, the top number is the number of chromosomes, contigs or scaffolds. The bottom number is the number of genes, as reported in [18]. On each branch, the label is the number of gene losses, which is directly inferred from the gene content at leaves. The simple circle is the root of the monophyletic group of non-duplicated species, referred in the text by σ. (B) The phylogenetic tree for Oryza sativa (rice), Brachypodium distachyon (brachypodium) and Sorghum bicolor (sorghum). At leaves, the top number is the number of chromosomes. The bottom number is the number of markers used in the study of cereal genomes.

Mentions: To evaluate the accuracy and running time of our approach, we first used data generated using simulated genome evolution. This allows us to dissect the impact of each aspect of the method and of the data on the accuracy of the reconstructed ancestor. Our simulations are based on the phylogenetic tree of yeast species shown in Figure 4 (A), which is ideal for this type of study as it contains a phylum affected by a whole-genome duplication and another that remains non-duplicated. Each of the simulation-based results reported in this section are averaged over 50 repetitions.


A flexible ancestral genome reconstruction method based on gapped adjacencies.

Gagnon Y, Blanchette M, El-Mabrouk N - BMC Bioinformatics (2012)

(A) Evolution of the 11 yeast species recorded in the Yeast Gene Order Browser, as given by [29]. The * indicates partially sequenced organisms. At leaves, the top number is the number of chromosomes, contigs or scaffolds. The bottom number is the number of genes, as reported in [18]. On each branch, the label is the number of gene losses, which is directly inferred from the gene content at leaves. The simple circle is the root of the monophyletic group of non-duplicated species, referred in the text by σ. (B) The phylogenetic tree for Oryza sativa (rice), Brachypodium distachyon (brachypodium) and Sorghum bicolor (sorghum). At leaves, the top number is the number of chromosomes. The bottom number is the number of markers used in the study of cereal genomes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3526437&req=5

Figure 4: (A) Evolution of the 11 yeast species recorded in the Yeast Gene Order Browser, as given by [29]. The * indicates partially sequenced organisms. At leaves, the top number is the number of chromosomes, contigs or scaffolds. The bottom number is the number of genes, as reported in [18]. On each branch, the label is the number of gene losses, which is directly inferred from the gene content at leaves. The simple circle is the root of the monophyletic group of non-duplicated species, referred in the text by σ. (B) The phylogenetic tree for Oryza sativa (rice), Brachypodium distachyon (brachypodium) and Sorghum bicolor (sorghum). At leaves, the top number is the number of chromosomes. The bottom number is the number of markers used in the study of cereal genomes.
Mentions: To evaluate the accuracy and running time of our approach, we first used data generated using simulated genome evolution. This allows us to dissect the impact of each aspect of the method and of the data on the accuracy of the reconstructed ancestor. Our simulations are based on the phylogenetic tree of yeast species shown in Figure 4 (A), which is ideal for this type of study as it contains a phylum affected by a whole-genome duplication and another that remains non-duplicated. Each of the simulation-based results reported in this section are averaged over 50 repetitions.

Bottom Line: The "small phylogeny" problem consists in inferring ancestral genomes associated with each internal node of a phylogenetic tree of a set of extant species.Ancestral relationships between markers are defined in term of Gapped Adjacencies, i.e. pairs of markers separated by up to a given number of markers.Applying our algorithm on various simulated data sets reveals good performance as we usually end up with a completely assembled genome, while keeping a low error rate.

View Article: PubMed Central - HTML - PubMed

Affiliation: Département d'Informatique, DIRO, Université de Montréal, Canada.

ABSTRACT

Background: The "small phylogeny" problem consists in inferring ancestral genomes associated with each internal node of a phylogenetic tree of a set of extant species. Existing methods can be grouped into two main categories: the distance-based methods aiming at minimizing a total branch length, and the synteny-based (or mapping) methods that first predict a collection of relations between ancestral markers in term of "synteny", and then assemble this collection into a set of Contiguous Ancestral Regions (CARs). The predicted CARs are likely to be more reliable as they are more directly deduced from observed conservations in extant species. However the challenge is to end up with a completely assembled genome.

Results: We develop a new synteny-based method that is flexible enough to handle a model of evolution involving whole genome duplication events, in addition to rearrangements, gene insertions, and losses. Ancestral relationships between markers are defined in term of Gapped Adjacencies, i.e. pairs of markers separated by up to a given number of markers. It improves on a previous restricted to direct adjacencies, which revealed a high accuracy for adjacency prediction, but with the drawback of being overly conservative, i.e. of generating a large number of CARs. Applying our algorithm on various simulated data sets reveals good performance as we usually end up with a completely assembled genome, while keeping a low error rate.

Availability: All source code is available at http://www.iro.umontreal.ca/~mabrouk.

Show MeSH
Related in: MedlinePlus