Limits...
A versatile computational pipeline for bacterial genome annotation improvement and comparative analysis, with Brucella as a use case.

Yu GX, Snyder EE, Boyle SM, Crasta OR, Czar M, Mane SP, Purkayastha A, Sobral B, Setubal JC - Nucleic Acids Res. (2007)

Bottom Line: GenVar also helps identify gene disruptions probably caused by sequencing errors.We exemplify GenVar's capabilities by presenting results from the analysis of four Brucella genomes.Brucella is an important human pathogen and zoonotic agent.

View Article: PubMed Central - PubMed

Affiliation: Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061, USA.

ABSTRACT
We present a bacterial genome computational analysis pipeline, called GenVar. The pipeline, based on the program GeneWise, is designed to analyze an annotated genome and automatically identify missed gene calls and sequence variants such as genes with disrupted reading frames (split genes) and those with insertions and deletions (indels). For a given genome to be analyzed, GenVar relies on a database containing closely related genomes (such as other species or strains) as well as a few additional reference genomes. GenVar also helps identify gene disruptions probably caused by sequencing errors. We exemplify GenVar's capabilities by presenting results from the analysis of four Brucella genomes. Brucella is an important human pathogen and zoonotic agent. The analysis revealed hundreds of missed gene calls, new split genes and indels, several of which are species specific and hence provide valuable clues to the understanding of the genome basis of Brucella pathogenicity and host specificity.

Show MeSH

Related in: MedlinePlus

Data flow in GenVar showing its three constitutive conceptual steps. sspDB: species-specific database; gwpDB: gene-specific database.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1919506&req=5

Figure 1: Data flow in GenVar showing its three constitutive conceptual steps. sspDB: species-specific database; gwpDB: gene-specific database.

Mentions: The first step is designed to establish, for each query genome feature (QGF), a gene-specific protein database (gwpDB), the first input for GeneWise (Figure 1, panel I). A QGF is either a protein-coding gene from the existing genome annotation or a DNA region between two immediately adjacent protein-coding genes on the chromosomes (intergenic DNA regions). The gwpDB is constructed from BLAST (15) analysis of the QGF on a species-specific protein database. The protein database consists of proteins from closely related genomes and also those that are well annotated (see above). Consequently, the gwpDB of the QGF would include a small number of proteins yet cover all its paralog and orthologs from closely related genomes as well as from well-annotated protein sequences.Figure 1.


A versatile computational pipeline for bacterial genome annotation improvement and comparative analysis, with Brucella as a use case.

Yu GX, Snyder EE, Boyle SM, Crasta OR, Czar M, Mane SP, Purkayastha A, Sobral B, Setubal JC - Nucleic Acids Res. (2007)

Data flow in GenVar showing its three constitutive conceptual steps. sspDB: species-specific database; gwpDB: gene-specific database.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1919506&req=5

Figure 1: Data flow in GenVar showing its three constitutive conceptual steps. sspDB: species-specific database; gwpDB: gene-specific database.
Mentions: The first step is designed to establish, for each query genome feature (QGF), a gene-specific protein database (gwpDB), the first input for GeneWise (Figure 1, panel I). A QGF is either a protein-coding gene from the existing genome annotation or a DNA region between two immediately adjacent protein-coding genes on the chromosomes (intergenic DNA regions). The gwpDB is constructed from BLAST (15) analysis of the QGF on a species-specific protein database. The protein database consists of proteins from closely related genomes and also those that are well annotated (see above). Consequently, the gwpDB of the QGF would include a small number of proteins yet cover all its paralog and orthologs from closely related genomes as well as from well-annotated protein sequences.Figure 1.

Bottom Line: GenVar also helps identify gene disruptions probably caused by sequencing errors.We exemplify GenVar's capabilities by presenting results from the analysis of four Brucella genomes.Brucella is an important human pathogen and zoonotic agent.

View Article: PubMed Central - PubMed

Affiliation: Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061, USA.

ABSTRACT
We present a bacterial genome computational analysis pipeline, called GenVar. The pipeline, based on the program GeneWise, is designed to analyze an annotated genome and automatically identify missed gene calls and sequence variants such as genes with disrupted reading frames (split genes) and those with insertions and deletions (indels). For a given genome to be analyzed, GenVar relies on a database containing closely related genomes (such as other species or strains) as well as a few additional reference genomes. GenVar also helps identify gene disruptions probably caused by sequencing errors. We exemplify GenVar's capabilities by presenting results from the analysis of four Brucella genomes. Brucella is an important human pathogen and zoonotic agent. The analysis revealed hundreds of missed gene calls, new split genes and indels, several of which are species specific and hence provide valuable clues to the understanding of the genome basis of Brucella pathogenicity and host specificity.

Show MeSH
Related in: MedlinePlus