Limits...
Mapping ancestral genomes with massive gene loss: a matrix sandwich problem.

Gavranović H, Chauve C, Salse J, Tannier E - Bioinformatics (2011)

Bottom Line: We use these results to propose a configuration for the proto-chromosomes of the monocot ancestor, and study the accuracy of this configuration.We also use our method to reconstruct the ancestral boreoeutherian genomes, which illustrates that the framework we propose is not specific to plant paleogenomics but is adapted to reconstruct any ancestral genome from extant genomes with heterogeneous marker content.Upon request to the authors. haris.gavranovic@gmail.com; eric.tannier@inria.fr.

View Article: PubMed Central - PubMed

Affiliation: Faculty of Natural Sciences, University of Sarajevo, Bosnia and Herzegovina. haris.gavranovic@gmail.com

ABSTRACT

Motivation: Ancestral genomes provide a better way to understand the structural evolution of genomes than the simple comparison of extant genomes. Most ancestral genome reconstruction methods rely on universal markers, that is, homologous families of DNA segments present in exactly one exemplar in every considered species. Complex histories of genes or other markers, undergoing duplications and losses, are rarely taken into account. It follows that some ancestors are inaccessible by these methods, such as the proto-monocotyledon whose evolution involved massive gene loss following a whole genome duplication.

Results: We propose a mapping approach based on the combinatorial notion of 'sandwich consecutive ones matrix', which explicitly takes gene losses into account. We introduce combinatorial optimization problems related to this concept, and propose a heuristic solver and a lower bound on the optimal solution. We use these results to propose a configuration for the proto-chromosomes of the monocot ancestor, and study the accuracy of this configuration. We also use our method to reconstruct the ancestral boreoeutherian genomes, which illustrates that the framework we propose is not specific to plant paleogenomics but is adapted to reconstruct any ancestral genome from extant genomes with heterogeneous marker content.

Availability: Upon request to the authors.

Contact: haris.gavranovic@gmail.com; eric.tannier@inria.fr.

Show MeSH
Two sub-matrices of the matrix displayed in Figure 6. (Left) Sub-matrix defined by rows (13, 17 and 42) and columns (7, 10, 11 and 12). This matrix has the SC1P, with columns order dabc. (Right) Sub-matrix defined by rows (14, 12, 13, 16 and 43) and columns (1, 6, 7, 8, 12 and 13). This matrix does not have the SC1P (Fig. 3). The existence of this sub-matrix proves the optimality of the solution in the Figure 6, because there is only one bad row in the solution.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3117370&req=5

Figure 1: Two sub-matrices of the matrix displayed in Figure 6. (Left) Sub-matrix defined by rows (13, 17 and 42) and columns (7, 10, 11 and 12). This matrix has the SC1P, with columns order dabc. (Right) Sub-matrix defined by rows (14, 12, 13, 16 and 43) and columns (1, 6, 7, 8, 12 and 13). This matrix does not have the SC1P (Fig. 3). The existence of this sub-matrix proves the optimality of the solution in the Figure 6, because there is only one bad row in the solution.

Mentions: Typically, we use a ternary matrix M to represent a set of features of an ancestral genome of interest: columns represent ancestral markers (such as ancestral genes for example), and each row represents a group of markers that are believed to be contiguous in this ancestral genome, with all columns with an entry 1 belonging to this group and possibly some columns with an entry X. The goal of the assembly phase in inferring an ancestral genome map is to order the columns of M to represent a possible order of the markers along the proto-chromosome of the ancestral genome. Ideally one would like to find a column order such that M has the SC1P. In practice, it happens, due to convergent evolution, errors in gene annotation or in homology assignment, that a matrix does not have the SC1P, that is, there is no permutation of the columns such that in each row, no zero entry is between two ones (Fig. 1).Fig. 1.


Mapping ancestral genomes with massive gene loss: a matrix sandwich problem.

Gavranović H, Chauve C, Salse J, Tannier E - Bioinformatics (2011)

Two sub-matrices of the matrix displayed in Figure 6. (Left) Sub-matrix defined by rows (13, 17 and 42) and columns (7, 10, 11 and 12). This matrix has the SC1P, with columns order dabc. (Right) Sub-matrix defined by rows (14, 12, 13, 16 and 43) and columns (1, 6, 7, 8, 12 and 13). This matrix does not have the SC1P (Fig. 3). The existence of this sub-matrix proves the optimality of the solution in the Figure 6, because there is only one bad row in the solution.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3117370&req=5

Figure 1: Two sub-matrices of the matrix displayed in Figure 6. (Left) Sub-matrix defined by rows (13, 17 and 42) and columns (7, 10, 11 and 12). This matrix has the SC1P, with columns order dabc. (Right) Sub-matrix defined by rows (14, 12, 13, 16 and 43) and columns (1, 6, 7, 8, 12 and 13). This matrix does not have the SC1P (Fig. 3). The existence of this sub-matrix proves the optimality of the solution in the Figure 6, because there is only one bad row in the solution.
Mentions: Typically, we use a ternary matrix M to represent a set of features of an ancestral genome of interest: columns represent ancestral markers (such as ancestral genes for example), and each row represents a group of markers that are believed to be contiguous in this ancestral genome, with all columns with an entry 1 belonging to this group and possibly some columns with an entry X. The goal of the assembly phase in inferring an ancestral genome map is to order the columns of M to represent a possible order of the markers along the proto-chromosome of the ancestral genome. Ideally one would like to find a column order such that M has the SC1P. In practice, it happens, due to convergent evolution, errors in gene annotation or in homology assignment, that a matrix does not have the SC1P, that is, there is no permutation of the columns such that in each row, no zero entry is between two ones (Fig. 1).Fig. 1.

Bottom Line: We use these results to propose a configuration for the proto-chromosomes of the monocot ancestor, and study the accuracy of this configuration.We also use our method to reconstruct the ancestral boreoeutherian genomes, which illustrates that the framework we propose is not specific to plant paleogenomics but is adapted to reconstruct any ancestral genome from extant genomes with heterogeneous marker content.Upon request to the authors. haris.gavranovic@gmail.com; eric.tannier@inria.fr.

View Article: PubMed Central - PubMed

Affiliation: Faculty of Natural Sciences, University of Sarajevo, Bosnia and Herzegovina. haris.gavranovic@gmail.com

ABSTRACT

Motivation: Ancestral genomes provide a better way to understand the structural evolution of genomes than the simple comparison of extant genomes. Most ancestral genome reconstruction methods rely on universal markers, that is, homologous families of DNA segments present in exactly one exemplar in every considered species. Complex histories of genes or other markers, undergoing duplications and losses, are rarely taken into account. It follows that some ancestors are inaccessible by these methods, such as the proto-monocotyledon whose evolution involved massive gene loss following a whole genome duplication.

Results: We propose a mapping approach based on the combinatorial notion of 'sandwich consecutive ones matrix', which explicitly takes gene losses into account. We introduce combinatorial optimization problems related to this concept, and propose a heuristic solver and a lower bound on the optimal solution. We use these results to propose a configuration for the proto-chromosomes of the monocot ancestor, and study the accuracy of this configuration. We also use our method to reconstruct the ancestral boreoeutherian genomes, which illustrates that the framework we propose is not specific to plant paleogenomics but is adapted to reconstruct any ancestral genome from extant genomes with heterogeneous marker content.

Availability: Upon request to the authors.

Contact: haris.gavranovic@gmail.com; eric.tannier@inria.fr.

Show MeSH