Limits...
A versatile computational pipeline for bacterial genome annotation improvement and comparative analysis, with Brucella as a use case.

Yu GX, Snyder EE, Boyle SM, Crasta OR, Czar M, Mane SP, Purkayastha A, Sobral B, Setubal JC - Nucleic Acids Res. (2007)

Bottom Line: GenVar also helps identify gene disruptions probably caused by sequencing errors.We exemplify GenVar's capabilities by presenting results from the analysis of four Brucella genomes.Brucella is an important human pathogen and zoonotic agent.

View Article: PubMed Central - PubMed

Affiliation: Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061, USA.

ABSTRACT
We present a bacterial genome computational analysis pipeline, called GenVar. The pipeline, based on the program GeneWise, is designed to analyze an annotated genome and automatically identify missed gene calls and sequence variants such as genes with disrupted reading frames (split genes) and those with insertions and deletions (indels). For a given genome to be analyzed, GenVar relies on a database containing closely related genomes (such as other species or strains) as well as a few additional reference genomes. GenVar also helps identify gene disruptions probably caused by sequencing errors. We exemplify GenVar's capabilities by presenting results from the analysis of four Brucella genomes. Brucella is an important human pathogen and zoonotic agent. The analysis revealed hundreds of missed gene calls, new split genes and indels, several of which are species specific and hence provide valuable clues to the understanding of the genome basis of Brucella pathogenicity and host specificity.

Show MeSH

Related in: MedlinePlus

Missed gene calls revealed in the intergenic DNA regions from the four Brucella genomes. The bars show the total number of missed gene calls (blue) and the number of missed gene calls that are larger than 100 AA and have orthologs with assigned biological functions (yellow). BME stands for B. melitensis 16M; BSU for B. suis 1330; BA9941 for B. abortus 9-941; and BA2308 for B. abortus 2308. The letters I and II stand for chromosomes I and II, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1919506&req=5

Figure 2: Missed gene calls revealed in the intergenic DNA regions from the four Brucella genomes. The bars show the total number of missed gene calls (blue) and the number of missed gene calls that are larger than 100 AA and have orthologs with assigned biological functions (yellow). BME stands for B. melitensis 16M; BSU for B. suis 1330; BA9941 for B. abortus 9-941; and BA2308 for B. abortus 2308. The letters I and II stand for chromosomes I and II, respectively.

Mentions: Many missed gene calls were detected and their numbers vary from genome to genome (Figure 2). For example, B. melitensis 16M has about 185 missed gene calls whereas B. suis 1330 has 50. About 77% of all missed gene calls have lengths that are less than or equal to 100 amino acids (AA). This result is consistent with previous findings that differences in gene number among completely sequenced Brucella genomes are mainly caused by annotation discrepancies in the number of small genes (9). GenVar did find several missed genes longer than 100 AA, some with orthologs having assigned biological functions (Figure 2).Figure 2.


A versatile computational pipeline for bacterial genome annotation improvement and comparative analysis, with Brucella as a use case.

Yu GX, Snyder EE, Boyle SM, Crasta OR, Czar M, Mane SP, Purkayastha A, Sobral B, Setubal JC - Nucleic Acids Res. (2007)

Missed gene calls revealed in the intergenic DNA regions from the four Brucella genomes. The bars show the total number of missed gene calls (blue) and the number of missed gene calls that are larger than 100 AA and have orthologs with assigned biological functions (yellow). BME stands for B. melitensis 16M; BSU for B. suis 1330; BA9941 for B. abortus 9-941; and BA2308 for B. abortus 2308. The letters I and II stand for chromosomes I and II, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1919506&req=5

Figure 2: Missed gene calls revealed in the intergenic DNA regions from the four Brucella genomes. The bars show the total number of missed gene calls (blue) and the number of missed gene calls that are larger than 100 AA and have orthologs with assigned biological functions (yellow). BME stands for B. melitensis 16M; BSU for B. suis 1330; BA9941 for B. abortus 9-941; and BA2308 for B. abortus 2308. The letters I and II stand for chromosomes I and II, respectively.
Mentions: Many missed gene calls were detected and their numbers vary from genome to genome (Figure 2). For example, B. melitensis 16M has about 185 missed gene calls whereas B. suis 1330 has 50. About 77% of all missed gene calls have lengths that are less than or equal to 100 amino acids (AA). This result is consistent with previous findings that differences in gene number among completely sequenced Brucella genomes are mainly caused by annotation discrepancies in the number of small genes (9). GenVar did find several missed genes longer than 100 AA, some with orthologs having assigned biological functions (Figure 2).Figure 2.

Bottom Line: GenVar also helps identify gene disruptions probably caused by sequencing errors.We exemplify GenVar's capabilities by presenting results from the analysis of four Brucella genomes.Brucella is an important human pathogen and zoonotic agent.

View Article: PubMed Central - PubMed

Affiliation: Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061, USA.

ABSTRACT
We present a bacterial genome computational analysis pipeline, called GenVar. The pipeline, based on the program GeneWise, is designed to analyze an annotated genome and automatically identify missed gene calls and sequence variants such as genes with disrupted reading frames (split genes) and those with insertions and deletions (indels). For a given genome to be analyzed, GenVar relies on a database containing closely related genomes (such as other species or strains) as well as a few additional reference genomes. GenVar also helps identify gene disruptions probably caused by sequencing errors. We exemplify GenVar's capabilities by presenting results from the analysis of four Brucella genomes. Brucella is an important human pathogen and zoonotic agent. The analysis revealed hundreds of missed gene calls, new split genes and indels, several of which are species specific and hence provide valuable clues to the understanding of the genome basis of Brucella pathogenicity and host specificity.

Show MeSH
Related in: MedlinePlus