Limits...
EDGAR: a software framework for the comparative analysis of prokaryotic genomes.

Blom J, Albaum SP, Doppmeier D, Pühler A, Vorhölter FJ, Zakrzewski M, Goesmann A - BMC Bioinformatics (2009)

Bottom Line: As one result of this development, it is now feasible to analyze large groups of related genomes in a comparative approach.Comparative analyses for 582 genomes across 75 genus groups taken from the NCBI genomes database were conducted with the software and the results were integrated into an underlying database.EDGAR provides novel analysis features and significantly simplifies the comparative analysis of related genomes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Genomics, Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany. jblom@cebitec.uni-bielefeld.de

ABSTRACT

Background: The introduction of next generation sequencing approaches has caused a rapid increase in the number of completely sequenced genomes. As one result of this development, it is now feasible to analyze large groups of related genomes in a comparative approach. A main task in comparative genomics is the identification of orthologous genes in different genomes and the classification of genes as core genes or singletons.

Results: To support these studies EDGAR - "Efficient Database framework for comparative Genome Analyses using BLAST score Ratios" - was developed. EDGAR is designed to automatically perform genome comparisons in a high throughput approach. Comparative analyses for 582 genomes across 75 genus groups taken from the NCBI genomes database were conducted with the software and the results were integrated into an underlying database. To demonstrate a specific application case, we analyzed ten genomes of the bacterial genus Xanthomonas, for which phylogenetic studies were awkward due to divergent taxonomic systems. The resultant phylogeny EDGAR provided was consistent with outcomes from traditional approaches performed recently and moreover, it was possible to root each strain with unprecedented accuracy.

Conclusion: EDGAR provides novel analysis features and significantly simplifies the comparative analysis of related genomes. The software supports a quick survey of evolutionary relationships and simplifies the process of obtaining new biological insights into the differential gene content of kindred genomes. Visualization features, like synteny plots or Venn diagrams, are offered to the scientific community through a web-based and therefore platform independent user interface http://edgar.cebitec.uni-bielefeld.de, where the precomputed data sets can be browsed.

Show MeSH

Related in: MedlinePlus

Venn diagrams. Venn diagrams. EDGAR facilitates visualizing common gene pools of by Venn diagrams. This analysis exploits all CDS of the genomes and is not restricted to the core genome. In each individual analysis at most 5 genomes can be included, as considering more chromosomes results in rather confusing visualization. Results for the X. campestris strains pathogenic to crucifers and the rice-pathogenic X. oryzae that were clustered in the phylogenetic analysis (Figure 4) are displayed in panels A and C, respectively. Among the X. campestris chromosomes in panel A a particular high similarity between Xcc 33913 and Xcc 8004 became evident. The chromosomes shared 178 orthologous CDS exclusively, and further 225 CDS conjointly with strain Xca 756C. In panel C among the X. oryzae genomes, the chromosomes of X. oryzae pv. oryzae strains shared 375 orthologs, while the X. oryzae pv. oryzicola chromosome overlapped less with the Xoo chromosomes. In panel B the Xac and Xcv chromosomes that clustered in between the X. campestris and X. oryzae groups were compared with each other and a representative of the X. campestris and X. oryzae groups. The analysis brought to light a surprisingly high number of 690 orthologs shared among Xac, Xcv and the Xoo representative, indicating closer connections of these strains to the X. oryzae group than to the crucifer pathogenic X. campestris strains.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2696450&req=5

Figure 7: Venn diagrams. Venn diagrams. EDGAR facilitates visualizing common gene pools of by Venn diagrams. This analysis exploits all CDS of the genomes and is not restricted to the core genome. In each individual analysis at most 5 genomes can be included, as considering more chromosomes results in rather confusing visualization. Results for the X. campestris strains pathogenic to crucifers and the rice-pathogenic X. oryzae that were clustered in the phylogenetic analysis (Figure 4) are displayed in panels A and C, respectively. Among the X. campestris chromosomes in panel A a particular high similarity between Xcc 33913 and Xcc 8004 became evident. The chromosomes shared 178 orthologous CDS exclusively, and further 225 CDS conjointly with strain Xca 756C. In panel C among the X. oryzae genomes, the chromosomes of X. oryzae pv. oryzae strains shared 375 orthologs, while the X. oryzae pv. oryzicola chromosome overlapped less with the Xoo chromosomes. In panel B the Xac and Xcv chromosomes that clustered in between the X. campestris and X. oryzae groups were compared with each other and a representative of the X. campestris and X. oryzae groups. The analysis brought to light a surprisingly high number of 690 orthologs shared among Xac, Xcv and the Xoo representative, indicating closer connections of these strains to the X. oryzae group than to the crucifer pathogenic X. campestris strains.

Mentions: The degree in gene order conservation among the Xanthomonas chromosomes as apparent from the synteny analysis seems well correlated with the phylogenetic order computed for the core genome CDS. Two taxonomic groups became evident, comprising of X. campestris strains pathogenic for crucifers and of X. oryzae strains pathogenic for rice. In between there was a third group consisting of Xca and Xcv 85-10. These three groups have been further characterized by analyzing the distribution of orthologous CDS within the groups (Figure 7). Among the crucifer-pathogenic X. campestris strains (Figure 7A) there were particular overlaps between the genomes of strain Xcc 33913 and Xcc 8004. For the genome of strain Xca 756C, that had been classified to the distinct pathovar "amoraciae", no outstanding role became obvious when compared to the Xcc strains. Among the X. oryzae chromosomes (Figure 7C) the Xoo strains had a large number orthologs in common, thus reflecting the different symptoms provoked by Xoc when affecting rice. The Xac/Xcv chromosomes that branched off the remaining Xanthomonads conjointly between the X. campestris and X. oryzae groups, shared many orthologs with an Xoo representative that was also included in the comparison (Figure 7B).


EDGAR: a software framework for the comparative analysis of prokaryotic genomes.

Blom J, Albaum SP, Doppmeier D, Pühler A, Vorhölter FJ, Zakrzewski M, Goesmann A - BMC Bioinformatics (2009)

Venn diagrams. Venn diagrams. EDGAR facilitates visualizing common gene pools of by Venn diagrams. This analysis exploits all CDS of the genomes and is not restricted to the core genome. In each individual analysis at most 5 genomes can be included, as considering more chromosomes results in rather confusing visualization. Results for the X. campestris strains pathogenic to crucifers and the rice-pathogenic X. oryzae that were clustered in the phylogenetic analysis (Figure 4) are displayed in panels A and C, respectively. Among the X. campestris chromosomes in panel A a particular high similarity between Xcc 33913 and Xcc 8004 became evident. The chromosomes shared 178 orthologous CDS exclusively, and further 225 CDS conjointly with strain Xca 756C. In panel C among the X. oryzae genomes, the chromosomes of X. oryzae pv. oryzae strains shared 375 orthologs, while the X. oryzae pv. oryzicola chromosome overlapped less with the Xoo chromosomes. In panel B the Xac and Xcv chromosomes that clustered in between the X. campestris and X. oryzae groups were compared with each other and a representative of the X. campestris and X. oryzae groups. The analysis brought to light a surprisingly high number of 690 orthologs shared among Xac, Xcv and the Xoo representative, indicating closer connections of these strains to the X. oryzae group than to the crucifer pathogenic X. campestris strains.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2696450&req=5

Figure 7: Venn diagrams. Venn diagrams. EDGAR facilitates visualizing common gene pools of by Venn diagrams. This analysis exploits all CDS of the genomes and is not restricted to the core genome. In each individual analysis at most 5 genomes can be included, as considering more chromosomes results in rather confusing visualization. Results for the X. campestris strains pathogenic to crucifers and the rice-pathogenic X. oryzae that were clustered in the phylogenetic analysis (Figure 4) are displayed in panels A and C, respectively. Among the X. campestris chromosomes in panel A a particular high similarity between Xcc 33913 and Xcc 8004 became evident. The chromosomes shared 178 orthologous CDS exclusively, and further 225 CDS conjointly with strain Xca 756C. In panel C among the X. oryzae genomes, the chromosomes of X. oryzae pv. oryzae strains shared 375 orthologs, while the X. oryzae pv. oryzicola chromosome overlapped less with the Xoo chromosomes. In panel B the Xac and Xcv chromosomes that clustered in between the X. campestris and X. oryzae groups were compared with each other and a representative of the X. campestris and X. oryzae groups. The analysis brought to light a surprisingly high number of 690 orthologs shared among Xac, Xcv and the Xoo representative, indicating closer connections of these strains to the X. oryzae group than to the crucifer pathogenic X. campestris strains.
Mentions: The degree in gene order conservation among the Xanthomonas chromosomes as apparent from the synteny analysis seems well correlated with the phylogenetic order computed for the core genome CDS. Two taxonomic groups became evident, comprising of X. campestris strains pathogenic for crucifers and of X. oryzae strains pathogenic for rice. In between there was a third group consisting of Xca and Xcv 85-10. These three groups have been further characterized by analyzing the distribution of orthologous CDS within the groups (Figure 7). Among the crucifer-pathogenic X. campestris strains (Figure 7A) there were particular overlaps between the genomes of strain Xcc 33913 and Xcc 8004. For the genome of strain Xca 756C, that had been classified to the distinct pathovar "amoraciae", no outstanding role became obvious when compared to the Xcc strains. Among the X. oryzae chromosomes (Figure 7C) the Xoo strains had a large number orthologs in common, thus reflecting the different symptoms provoked by Xoc when affecting rice. The Xac/Xcv chromosomes that branched off the remaining Xanthomonads conjointly between the X. campestris and X. oryzae groups, shared many orthologs with an Xoo representative that was also included in the comparison (Figure 7B).

Bottom Line: As one result of this development, it is now feasible to analyze large groups of related genomes in a comparative approach.Comparative analyses for 582 genomes across 75 genus groups taken from the NCBI genomes database were conducted with the software and the results were integrated into an underlying database.EDGAR provides novel analysis features and significantly simplifies the comparative analysis of related genomes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Genomics, Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany. jblom@cebitec.uni-bielefeld.de

ABSTRACT

Background: The introduction of next generation sequencing approaches has caused a rapid increase in the number of completely sequenced genomes. As one result of this development, it is now feasible to analyze large groups of related genomes in a comparative approach. A main task in comparative genomics is the identification of orthologous genes in different genomes and the classification of genes as core genes or singletons.

Results: To support these studies EDGAR - "Efficient Database framework for comparative Genome Analyses using BLAST score Ratios" - was developed. EDGAR is designed to automatically perform genome comparisons in a high throughput approach. Comparative analyses for 582 genomes across 75 genus groups taken from the NCBI genomes database were conducted with the software and the results were integrated into an underlying database. To demonstrate a specific application case, we analyzed ten genomes of the bacterial genus Xanthomonas, for which phylogenetic studies were awkward due to divergent taxonomic systems. The resultant phylogeny EDGAR provided was consistent with outcomes from traditional approaches performed recently and moreover, it was possible to root each strain with unprecedented accuracy.

Conclusion: EDGAR provides novel analysis features and significantly simplifies the comparative analysis of related genomes. The software supports a quick survey of evolutionary relationships and simplifies the process of obtaining new biological insights into the differential gene content of kindred genomes. Visualization features, like synteny plots or Venn diagrams, are offered to the scientific community through a web-based and therefore platform independent user interface http://edgar.cebitec.uni-bielefeld.de, where the precomputed data sets can be browsed.

Show MeSH
Related in: MedlinePlus