Limits...
EDGAR: a software framework for the comparative analysis of prokaryotic genomes.

Blom J, Albaum SP, Doppmeier D, Pühler A, Vorhölter FJ, Zakrzewski M, Goesmann A - BMC Bioinformatics (2009)

Bottom Line: As one result of this development, it is now feasible to analyze large groups of related genomes in a comparative approach.Comparative analyses for 582 genomes across 75 genus groups taken from the NCBI genomes database were conducted with the software and the results were integrated into an underlying database.EDGAR provides novel analysis features and significantly simplifies the comparative analysis of related genomes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Genomics, Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany. jblom@cebitec.uni-bielefeld.de

ABSTRACT

Background: The introduction of next generation sequencing approaches has caused a rapid increase in the number of completely sequenced genomes. As one result of this development, it is now feasible to analyze large groups of related genomes in a comparative approach. A main task in comparative genomics is the identification of orthologous genes in different genomes and the classification of genes as core genes or singletons.

Results: To support these studies EDGAR - "Efficient Database framework for comparative Genome Analyses using BLAST score Ratios" - was developed. EDGAR is designed to automatically perform genome comparisons in a high throughput approach. Comparative analyses for 582 genomes across 75 genus groups taken from the NCBI genomes database were conducted with the software and the results were integrated into an underlying database. To demonstrate a specific application case, we analyzed ten genomes of the bacterial genus Xanthomonas, for which phylogenetic studies were awkward due to divergent taxonomic systems. The resultant phylogeny EDGAR provided was consistent with outcomes from traditional approaches performed recently and moreover, it was possible to root each strain with unprecedented accuracy.

Conclusion: EDGAR provides novel analysis features and significantly simplifies the comparative analysis of related genomes. The software supports a quick survey of evolutionary relationships and simplifies the process of obtaining new biological insights into the differential gene content of kindred genomes. Visualization features, like synteny plots or Venn diagrams, are offered to the scientific community through a web-based and therefore platform independent user interface http://edgar.cebitec.uni-bielefeld.de, where the precomputed data sets can be browsed.

Show MeSH

Related in: MedlinePlus

SRVs for Corynebacterium genus. Histogram of the SRVs (multiplied by 100 to gain percent values) resulting from the comparison of two Corynebacterium strains. There is no clear peak at the high score region of the histogram. The lowest scoring window is found at positions 90 – 100 and the lowest single value is found at 98%. In this comparison the vast majority of all BLAST hits would be left out. For this reason the cutoff for genome comparisons showing no bimodal distribution is automatically set to 35%.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2696450&req=5

Figure 2: SRVs for Corynebacterium genus. Histogram of the SRVs (multiplied by 100 to gain percent values) resulting from the comparison of two Corynebacterium strains. There is no clear peak at the high score region of the histogram. The lowest scoring window is found at positions 90 – 100 and the lowest single value is found at 98%. In this comparison the vast majority of all BLAST hits would be left out. For this reason the cutoff for genome comparisons showing no bimodal distribution is automatically set to 35%.

Mentions: In some cases the SRV distribution does not show the expected bimodal shape. This is mostly the case if there is a high variation within the genomes of a genus. A good example is the genus Corynebacterium, where the genomes are very diverse, leading to a SRV distribution with only one peak for low similarities and a broad plateau of medium scores with a decay at the highest scores. The cutoff calculation described above leads to cutoffs at very high SRVs, thereby omitting the majority of all BLAST observations (see Figure 2). To overcome this problem the master-cutoff is set to 35% if the majority of SRV histograms do not show a bimodal distribution. This value has shown to be a good cutoff value in the genomes compared with EDGAR.


EDGAR: a software framework for the comparative analysis of prokaryotic genomes.

Blom J, Albaum SP, Doppmeier D, Pühler A, Vorhölter FJ, Zakrzewski M, Goesmann A - BMC Bioinformatics (2009)

SRVs for Corynebacterium genus. Histogram of the SRVs (multiplied by 100 to gain percent values) resulting from the comparison of two Corynebacterium strains. There is no clear peak at the high score region of the histogram. The lowest scoring window is found at positions 90 – 100 and the lowest single value is found at 98%. In this comparison the vast majority of all BLAST hits would be left out. For this reason the cutoff for genome comparisons showing no bimodal distribution is automatically set to 35%.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2696450&req=5

Figure 2: SRVs for Corynebacterium genus. Histogram of the SRVs (multiplied by 100 to gain percent values) resulting from the comparison of two Corynebacterium strains. There is no clear peak at the high score region of the histogram. The lowest scoring window is found at positions 90 – 100 and the lowest single value is found at 98%. In this comparison the vast majority of all BLAST hits would be left out. For this reason the cutoff for genome comparisons showing no bimodal distribution is automatically set to 35%.
Mentions: In some cases the SRV distribution does not show the expected bimodal shape. This is mostly the case if there is a high variation within the genomes of a genus. A good example is the genus Corynebacterium, where the genomes are very diverse, leading to a SRV distribution with only one peak for low similarities and a broad plateau of medium scores with a decay at the highest scores. The cutoff calculation described above leads to cutoffs at very high SRVs, thereby omitting the majority of all BLAST observations (see Figure 2). To overcome this problem the master-cutoff is set to 35% if the majority of SRV histograms do not show a bimodal distribution. This value has shown to be a good cutoff value in the genomes compared with EDGAR.

Bottom Line: As one result of this development, it is now feasible to analyze large groups of related genomes in a comparative approach.Comparative analyses for 582 genomes across 75 genus groups taken from the NCBI genomes database were conducted with the software and the results were integrated into an underlying database.EDGAR provides novel analysis features and significantly simplifies the comparative analysis of related genomes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Genomics, Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany. jblom@cebitec.uni-bielefeld.de

ABSTRACT

Background: The introduction of next generation sequencing approaches has caused a rapid increase in the number of completely sequenced genomes. As one result of this development, it is now feasible to analyze large groups of related genomes in a comparative approach. A main task in comparative genomics is the identification of orthologous genes in different genomes and the classification of genes as core genes or singletons.

Results: To support these studies EDGAR - "Efficient Database framework for comparative Genome Analyses using BLAST score Ratios" - was developed. EDGAR is designed to automatically perform genome comparisons in a high throughput approach. Comparative analyses for 582 genomes across 75 genus groups taken from the NCBI genomes database were conducted with the software and the results were integrated into an underlying database. To demonstrate a specific application case, we analyzed ten genomes of the bacterial genus Xanthomonas, for which phylogenetic studies were awkward due to divergent taxonomic systems. The resultant phylogeny EDGAR provided was consistent with outcomes from traditional approaches performed recently and moreover, it was possible to root each strain with unprecedented accuracy.

Conclusion: EDGAR provides novel analysis features and significantly simplifies the comparative analysis of related genomes. The software supports a quick survey of evolutionary relationships and simplifies the process of obtaining new biological insights into the differential gene content of kindred genomes. Visualization features, like synteny plots or Venn diagrams, are offered to the scientific community through a web-based and therefore platform independent user interface http://edgar.cebitec.uni-bielefeld.de, where the precomputed data sets can be browsed.

Show MeSH
Related in: MedlinePlus