Limits...
A flexible ancestral genome reconstruction method based on gapped adjacencies.

Gagnon Y, Blanchette M, El-Mabrouk N - BMC Bioinformatics (2012)

Bottom Line: The "small phylogeny" problem consists in inferring ancestral genomes associated with each internal node of a phylogenetic tree of a set of extant species.Ancestral relationships between markers are defined in term of Gapped Adjacencies, i.e. pairs of markers separated by up to a given number of markers.Applying our algorithm on various simulated data sets reveals good performance as we usually end up with a completely assembled genome, while keeping a low error rate.

View Article: PubMed Central - HTML - PubMed

Affiliation: Département d'Informatique, DIRO, Université de Montréal, Canada.

ABSTRACT

Background: The "small phylogeny" problem consists in inferring ancestral genomes associated with each internal node of a phylogenetic tree of a set of extant species. Existing methods can be grouped into two main categories: the distance-based methods aiming at minimizing a total branch length, and the synteny-based (or mapping) methods that first predict a collection of relations between ancestral markers in term of "synteny", and then assemble this collection into a set of Contiguous Ancestral Regions (CARs). The predicted CARs are likely to be more reliable as they are more directly deduced from observed conservations in extant species. However the challenge is to end up with a completely assembled genome.

Results: We develop a new synteny-based method that is flexible enough to handle a model of evolution involving whole genome duplication events, in addition to rearrangements, gene insertions, and losses. Ancestral relationships between markers are defined in term of Gapped Adjacencies, i.e. pairs of markers separated by up to a given number of markers. It improves on a previous restricted to direct adjacencies, which revealed a high accuracy for adjacency prediction, but with the drawback of being overly conservative, i.e. of generating a large number of CARs. Applying our algorithm on various simulated data sets reveals good performance as we usually end up with a completely assembled genome, while keeping a low error rate.

Availability: All source code is available at http://www.iro.umontreal.ca/~mabrouk.

Show MeSH

Related in: MedlinePlus

An illustration of Step 1 for a gene g and an internal node u. Numbers in brackets indicate the multiplicity of gene g at each node of the tree. Multisets at leaves represent (say left) adjacencies of gene g in the corresponding genome. All multisets X of possible adjacencies of g at node u are shown, followed by the value of LeftAdj(g, S/LA(g,1,G(u))=X). The rest of notation illustrates how the value 6 is obtained at u for the multiset {a, a}: the root and WGD node labels are the adjacencies that have to be set for g, and the label of an edge (v, w) is the number of conserved adjacencies for g on that branch.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3526437&req=5

Figure 2: An illustration of Step 1 for a gene g and an internal node u. Numbers in brackets indicate the multiplicity of gene g at each node of the tree. Multisets at leaves represent (say left) adjacencies of gene g in the corresponding genome. All multisets X of possible adjacencies of g at node u are shown, followed by the value of LeftAdj(g, S/LA(g,1,G(u))=X). The rest of notation illustrates how the value 6 is obtained at u for the multiset {a, a}: the root and WGD node labels are the adjacencies that have to be set for g, and the label of an edge (v, w) is the number of conserved adjacencies for g on that branch.

Mentions: Step 1: For each internal node u of the tree (speciation or WGD node), each gene , and each multiset X of possible adjacencies of g at node u, we compute LeftAdj(g, S/LA(g,1,G(u)) = X) and RightAdj(g, S/LA(g,1,G(u)) = X) using a Dynamic Programming Algorithm. The values at a node u are computed from the values at the two children and also at the parent of u (see Figure 2 for an example).


A flexible ancestral genome reconstruction method based on gapped adjacencies.

Gagnon Y, Blanchette M, El-Mabrouk N - BMC Bioinformatics (2012)

An illustration of Step 1 for a gene g and an internal node u. Numbers in brackets indicate the multiplicity of gene g at each node of the tree. Multisets at leaves represent (say left) adjacencies of gene g in the corresponding genome. All multisets X of possible adjacencies of g at node u are shown, followed by the value of LeftAdj(g, S/LA(g,1,G(u))=X). The rest of notation illustrates how the value 6 is obtained at u for the multiset {a, a}: the root and WGD node labels are the adjacencies that have to be set for g, and the label of an edge (v, w) is the number of conserved adjacencies for g on that branch.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3526437&req=5

Figure 2: An illustration of Step 1 for a gene g and an internal node u. Numbers in brackets indicate the multiplicity of gene g at each node of the tree. Multisets at leaves represent (say left) adjacencies of gene g in the corresponding genome. All multisets X of possible adjacencies of g at node u are shown, followed by the value of LeftAdj(g, S/LA(g,1,G(u))=X). The rest of notation illustrates how the value 6 is obtained at u for the multiset {a, a}: the root and WGD node labels are the adjacencies that have to be set for g, and the label of an edge (v, w) is the number of conserved adjacencies for g on that branch.
Mentions: Step 1: For each internal node u of the tree (speciation or WGD node), each gene , and each multiset X of possible adjacencies of g at node u, we compute LeftAdj(g, S/LA(g,1,G(u)) = X) and RightAdj(g, S/LA(g,1,G(u)) = X) using a Dynamic Programming Algorithm. The values at a node u are computed from the values at the two children and also at the parent of u (see Figure 2 for an example).

Bottom Line: The "small phylogeny" problem consists in inferring ancestral genomes associated with each internal node of a phylogenetic tree of a set of extant species.Ancestral relationships between markers are defined in term of Gapped Adjacencies, i.e. pairs of markers separated by up to a given number of markers.Applying our algorithm on various simulated data sets reveals good performance as we usually end up with a completely assembled genome, while keeping a low error rate.

View Article: PubMed Central - HTML - PubMed

Affiliation: Département d'Informatique, DIRO, Université de Montréal, Canada.

ABSTRACT

Background: The "small phylogeny" problem consists in inferring ancestral genomes associated with each internal node of a phylogenetic tree of a set of extant species. Existing methods can be grouped into two main categories: the distance-based methods aiming at minimizing a total branch length, and the synteny-based (or mapping) methods that first predict a collection of relations between ancestral markers in term of "synteny", and then assemble this collection into a set of Contiguous Ancestral Regions (CARs). The predicted CARs are likely to be more reliable as they are more directly deduced from observed conservations in extant species. However the challenge is to end up with a completely assembled genome.

Results: We develop a new synteny-based method that is flexible enough to handle a model of evolution involving whole genome duplication events, in addition to rearrangements, gene insertions, and losses. Ancestral relationships between markers are defined in term of Gapped Adjacencies, i.e. pairs of markers separated by up to a given number of markers. It improves on a previous restricted to direct adjacencies, which revealed a high accuracy for adjacency prediction, but with the drawback of being overly conservative, i.e. of generating a large number of CARs. Applying our algorithm on various simulated data sets reveals good performance as we usually end up with a completely assembled genome, while keeping a low error rate.

Availability: All source code is available at http://www.iro.umontreal.ca/~mabrouk.

Show MeSH
Related in: MedlinePlus