Limits...
A flexible ancestral genome reconstruction method based on gapped adjacencies.

Gagnon Y, Blanchette M, El-Mabrouk N - BMC Bioinformatics (2012)

Bottom Line: The "small phylogeny" problem consists in inferring ancestral genomes associated with each internal node of a phylogenetic tree of a set of extant species.Ancestral relationships between markers are defined in term of Gapped Adjacencies, i.e. pairs of markers separated by up to a given number of markers.Applying our algorithm on various simulated data sets reveals good performance as we usually end up with a completely assembled genome, while keeping a low error rate.

View Article: PubMed Central - HTML - PubMed

Affiliation: Département d'Informatique, DIRO, Université de Montréal, Canada.

ABSTRACT

Background: The "small phylogeny" problem consists in inferring ancestral genomes associated with each internal node of a phylogenetic tree of a set of extant species. Existing methods can be grouped into two main categories: the distance-based methods aiming at minimizing a total branch length, and the synteny-based (or mapping) methods that first predict a collection of relations between ancestral markers in term of "synteny", and then assemble this collection into a set of Contiguous Ancestral Regions (CARs). The predicted CARs are likely to be more reliable as they are more directly deduced from observed conservations in extant species. However the challenge is to end up with a completely assembled genome.

Results: We develop a new synteny-based method that is flexible enough to handle a model of evolution involving whole genome duplication events, in addition to rearrangements, gene insertions, and losses. Ancestral relationships between markers are defined in term of Gapped Adjacencies, i.e. pairs of markers separated by up to a given number of markers. It improves on a previous restricted to direct adjacencies, which revealed a high accuracy for adjacency prediction, but with the drawback of being overly conservative, i.e. of generating a large number of CARs. Applying our algorithm on various simulated data sets reveals good performance as we usually end up with a completely assembled genome, while keeping a low error rate.

Availability: All source code is available at http://www.iro.umontreal.ca/~mabrouk.

Show MeSH

Related in: MedlinePlus

A species tree for the set of species Γ = {1, 2, 3}, with two different genome assignments at leaves. Example (B) depicts a most parsimonious inversion scenario leading to the observed genomes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3526437&req=5

Figure 3: A species tree for the set of species Γ = {1, 2, 3}, with two different genome assignments at leaves. Example (B) depicts a most parsimonious inversion scenario leading to the observed genomes.

Mentions: Many adjacencies in an ancestral genome are likely to be no longer present in some present-day genomes due to rearrangements and content-modifying operations, preventing from reconstructing large CARs. However, since small and local evolutionary events are more frequent than large and far-reaching operations [21], we can expect to reconnect neighboring CARs by considering gapped adjacencies. Consider for example the species tree (A) of Figure 3. As a and b are "neighboring" (close) genes in all three genomes, we expect the inferred ancestral genome at the root of the tree to have a CAR with genes a and b. However, as all (right) direct adjacencies of a are different (b in 1, -b in 2 and x in 3), none of these adjacencies would have a score attaining a reasonable minimum cost τ for the TSP, and a and b will end up in two different CARs with algorithm DirectAdj. However, as b (and also -b) is a 2 - adjacency of a in two extent genomes, and a 3 - adjacency of a in all three genomes, they end up in the same CAR if we consider 2 or 3-adjacencies (second or third iteration of GapAdj algorithm described in the next section). As another example, consider a "true" evolutionary scenario depicted in Figure 3.(B). Consider a threshold τ for TSP-τ corresponding to an adjacency being present in two of the three extent species. Then, as the only direct adjacency present at least twice in extant genomes is bc, DirectAdj leaves a and bc in two separate CARs. However, as b is a 2-adjacency of a in species 1 and 2 (it is actually the only adjacency reaching the threshold up to α = 2), GapAdj would end up with a CAR containing the sequence abc after iteration α = 2.


A flexible ancestral genome reconstruction method based on gapped adjacencies.

Gagnon Y, Blanchette M, El-Mabrouk N - BMC Bioinformatics (2012)

A species tree for the set of species Γ = {1, 2, 3}, with two different genome assignments at leaves. Example (B) depicts a most parsimonious inversion scenario leading to the observed genomes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3526437&req=5

Figure 3: A species tree for the set of species Γ = {1, 2, 3}, with two different genome assignments at leaves. Example (B) depicts a most parsimonious inversion scenario leading to the observed genomes.
Mentions: Many adjacencies in an ancestral genome are likely to be no longer present in some present-day genomes due to rearrangements and content-modifying operations, preventing from reconstructing large CARs. However, since small and local evolutionary events are more frequent than large and far-reaching operations [21], we can expect to reconnect neighboring CARs by considering gapped adjacencies. Consider for example the species tree (A) of Figure 3. As a and b are "neighboring" (close) genes in all three genomes, we expect the inferred ancestral genome at the root of the tree to have a CAR with genes a and b. However, as all (right) direct adjacencies of a are different (b in 1, -b in 2 and x in 3), none of these adjacencies would have a score attaining a reasonable minimum cost τ for the TSP, and a and b will end up in two different CARs with algorithm DirectAdj. However, as b (and also -b) is a 2 - adjacency of a in two extent genomes, and a 3 - adjacency of a in all three genomes, they end up in the same CAR if we consider 2 or 3-adjacencies (second or third iteration of GapAdj algorithm described in the next section). As another example, consider a "true" evolutionary scenario depicted in Figure 3.(B). Consider a threshold τ for TSP-τ corresponding to an adjacency being present in two of the three extent species. Then, as the only direct adjacency present at least twice in extant genomes is bc, DirectAdj leaves a and bc in two separate CARs. However, as b is a 2-adjacency of a in species 1 and 2 (it is actually the only adjacency reaching the threshold up to α = 2), GapAdj would end up with a CAR containing the sequence abc after iteration α = 2.

Bottom Line: The "small phylogeny" problem consists in inferring ancestral genomes associated with each internal node of a phylogenetic tree of a set of extant species.Ancestral relationships between markers are defined in term of Gapped Adjacencies, i.e. pairs of markers separated by up to a given number of markers.Applying our algorithm on various simulated data sets reveals good performance as we usually end up with a completely assembled genome, while keeping a low error rate.

View Article: PubMed Central - HTML - PubMed

Affiliation: Département d'Informatique, DIRO, Université de Montréal, Canada.

ABSTRACT

Background: The "small phylogeny" problem consists in inferring ancestral genomes associated with each internal node of a phylogenetic tree of a set of extant species. Existing methods can be grouped into two main categories: the distance-based methods aiming at minimizing a total branch length, and the synteny-based (or mapping) methods that first predict a collection of relations between ancestral markers in term of "synteny", and then assemble this collection into a set of Contiguous Ancestral Regions (CARs). The predicted CARs are likely to be more reliable as they are more directly deduced from observed conservations in extant species. However the challenge is to end up with a completely assembled genome.

Results: We develop a new synteny-based method that is flexible enough to handle a model of evolution involving whole genome duplication events, in addition to rearrangements, gene insertions, and losses. Ancestral relationships between markers are defined in term of Gapped Adjacencies, i.e. pairs of markers separated by up to a given number of markers. It improves on a previous restricted to direct adjacencies, which revealed a high accuracy for adjacency prediction, but with the drawback of being overly conservative, i.e. of generating a large number of CARs. Applying our algorithm on various simulated data sets reveals good performance as we usually end up with a completely assembled genome, while keeping a low error rate.

Availability: All source code is available at http://www.iro.umontreal.ca/~mabrouk.

Show MeSH
Related in: MedlinePlus