Limits...
Assessing the evolutionary rate of positional orthologous genes in prokaryotes using synteny data.

Lemoine F, Lespinet O, Labedan B - BMC Evol. Biol. (2007)

Bottom Line: Once all these synteny blocks have been detected, we showed that POGs are subject to more evolutionary constraints than orthologs outside synteny groups, whichever the taxonomic distance separating the compared organisms.The suite of programs described in this paper allows a reliable detection of orthologs and is useful for evaluating gene order conservation in prokaryotes whichever their taxonomic distance.Thus, our approach will make easy the rapid identification of POGS in the next few years as we are expecting to be inundated with thousands of completely sequenced microbial genomes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institut de Génétique et Microbiologie, CNRS UMR 8621, Bâtiment 400, Université Paris Sud XI, 91405 Orsay Cedex, France. frederic.lemoine@igmors.u-psud.fr

ABSTRACT

Background: Comparison of completely sequenced microbial genomes has revealed how fluid these genomes are. Detecting synteny blocks requires reliable methods to determining the orthologs among the whole set of homologs detected by exhaustive comparisons between each pair of completely sequenced genomes. This is a complex and difficult problem in the field of comparative genomics but will help to better understand the way prokaryotic genomes are evolving.

Results: We have developed a suite of programs that automate three essential steps to study conservation of gene order, and validated them with a set of 107 bacteria and archaea that cover the majority of the prokaryotic taxonomic space. We identified the whole set of shared homologs between two or more species and computed the evolutionary distance separating each pair of homologs. We applied two strategies to extract from the set of homologs a collection of valid orthologs shared by at least two genomes. The first computes the Reciprocal Smallest Distance (RSD) using the PAM distances separating pairs of homologs. The second method groups homologs in families and reconstructs each family's evolutionary tree, distinguishing bona fide orthologs as well as paralogs created after the last speciation event. Although the phylogenetic tree method often succeeds where RSD fails, the reverse could occasionally be true. Accordingly, we used the data obtained with either methods or their intersection to number the orthologs that are adjacent in for each pair of genomes, the Positional Orthologous Genes (POGs), and to further study their properties. Once all these synteny blocks have been detected, we showed that POGs are subject to more evolutionary constraints than orthologs outside synteny groups, whichever the taxonomic distance separating the compared organisms.

Conclusion: The suite of programs described in this paper allows a reliable detection of orthologs and is useful for evaluating gene order conservation in prokaryotes whichever their taxonomic distance. Thus, our approach will make easy the rapid identification of POGS in the next few years as we are expecting to be inundated with thousands of completely sequenced microbial genomes.

Show MeSH
Phylogenetic tree showing pros and cons of both ortholog detection methods. We used PhyML [59] to reconstruct a maximum likelihood tree for family 4565 that groups chemotaxis proteins CheC. The table on the right summarizes the data obtained when listing the orthologs of the Archaeoglobus fulgidus (O29223_ARCFU) and the Bacillus subtilis (CHEC_BACSU) sequences respectively. Orthologs found by the RSD approach are indicated by a star, those detected with the phylogeny approach by black triangles, and those found by both methods with a chevron. All nodes are supposed to correspond to speciation events except those labeled with a black dot, which are assumed to be due to gene duplication events.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2238764&req=5

Figure 2: Phylogenetic tree showing pros and cons of both ortholog detection methods. We used PhyML [59] to reconstruct a maximum likelihood tree for family 4565 that groups chemotaxis proteins CheC. The table on the right summarizes the data obtained when listing the orthologs of the Archaeoglobus fulgidus (O29223_ARCFU) and the Bacillus subtilis (CHEC_BACSU) sequences respectively. Orthologs found by the RSD approach are indicated by a star, those detected with the phylogeny approach by black triangles, and those found by both methods with a chevron. All nodes are supposed to correspond to speciation events except those labeled with a black dot, which are assumed to be due to gene duplication events.

Mentions: After this step, one extremely large protein "family" remained. This heterogeneous cluster contained 107,219 members that are mainly hydrophobic proteins such as transporters and other membrane proteins, including many (20,607) proteins with unknown function. Such a huge gathering of disparate proteins is biologically meaningless. Moreover, it was clearly worthless to analyze it by a tree approach due to its size and complexity (see below, Figs. 1 and 2). Therefore, we applied the MCL algorithm [33,34] to break up this huge and heterogeneous cluster.


Assessing the evolutionary rate of positional orthologous genes in prokaryotes using synteny data.

Lemoine F, Lespinet O, Labedan B - BMC Evol. Biol. (2007)

Phylogenetic tree showing pros and cons of both ortholog detection methods. We used PhyML [59] to reconstruct a maximum likelihood tree for family 4565 that groups chemotaxis proteins CheC. The table on the right summarizes the data obtained when listing the orthologs of the Archaeoglobus fulgidus (O29223_ARCFU) and the Bacillus subtilis (CHEC_BACSU) sequences respectively. Orthologs found by the RSD approach are indicated by a star, those detected with the phylogeny approach by black triangles, and those found by both methods with a chevron. All nodes are supposed to correspond to speciation events except those labeled with a black dot, which are assumed to be due to gene duplication events.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2238764&req=5

Figure 2: Phylogenetic tree showing pros and cons of both ortholog detection methods. We used PhyML [59] to reconstruct a maximum likelihood tree for family 4565 that groups chemotaxis proteins CheC. The table on the right summarizes the data obtained when listing the orthologs of the Archaeoglobus fulgidus (O29223_ARCFU) and the Bacillus subtilis (CHEC_BACSU) sequences respectively. Orthologs found by the RSD approach are indicated by a star, those detected with the phylogeny approach by black triangles, and those found by both methods with a chevron. All nodes are supposed to correspond to speciation events except those labeled with a black dot, which are assumed to be due to gene duplication events.
Mentions: After this step, one extremely large protein "family" remained. This heterogeneous cluster contained 107,219 members that are mainly hydrophobic proteins such as transporters and other membrane proteins, including many (20,607) proteins with unknown function. Such a huge gathering of disparate proteins is biologically meaningless. Moreover, it was clearly worthless to analyze it by a tree approach due to its size and complexity (see below, Figs. 1 and 2). Therefore, we applied the MCL algorithm [33,34] to break up this huge and heterogeneous cluster.

Bottom Line: Once all these synteny blocks have been detected, we showed that POGs are subject to more evolutionary constraints than orthologs outside synteny groups, whichever the taxonomic distance separating the compared organisms.The suite of programs described in this paper allows a reliable detection of orthologs and is useful for evaluating gene order conservation in prokaryotes whichever their taxonomic distance.Thus, our approach will make easy the rapid identification of POGS in the next few years as we are expecting to be inundated with thousands of completely sequenced microbial genomes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institut de Génétique et Microbiologie, CNRS UMR 8621, Bâtiment 400, Université Paris Sud XI, 91405 Orsay Cedex, France. frederic.lemoine@igmors.u-psud.fr

ABSTRACT

Background: Comparison of completely sequenced microbial genomes has revealed how fluid these genomes are. Detecting synteny blocks requires reliable methods to determining the orthologs among the whole set of homologs detected by exhaustive comparisons between each pair of completely sequenced genomes. This is a complex and difficult problem in the field of comparative genomics but will help to better understand the way prokaryotic genomes are evolving.

Results: We have developed a suite of programs that automate three essential steps to study conservation of gene order, and validated them with a set of 107 bacteria and archaea that cover the majority of the prokaryotic taxonomic space. We identified the whole set of shared homologs between two or more species and computed the evolutionary distance separating each pair of homologs. We applied two strategies to extract from the set of homologs a collection of valid orthologs shared by at least two genomes. The first computes the Reciprocal Smallest Distance (RSD) using the PAM distances separating pairs of homologs. The second method groups homologs in families and reconstructs each family's evolutionary tree, distinguishing bona fide orthologs as well as paralogs created after the last speciation event. Although the phylogenetic tree method often succeeds where RSD fails, the reverse could occasionally be true. Accordingly, we used the data obtained with either methods or their intersection to number the orthologs that are adjacent in for each pair of genomes, the Positional Orthologous Genes (POGs), and to further study their properties. Once all these synteny blocks have been detected, we showed that POGs are subject to more evolutionary constraints than orthologs outside synteny groups, whichever the taxonomic distance separating the compared organisms.

Conclusion: The suite of programs described in this paper allows a reliable detection of orthologs and is useful for evaluating gene order conservation in prokaryotes whichever their taxonomic distance. Thus, our approach will make easy the rapid identification of POGS in the next few years as we are expecting to be inundated with thousands of completely sequenced microbial genomes.

Show MeSH