Limits...
Genome (re-)annotation and open-source annotation pipelines.

Siezen RJ, van Hijum SA - Microb Biotechnol (2010)

View Article: PubMed Central - PubMed

Affiliation: Kluyver Centre for Genomics of Industrial Fermentation, TI Food and Nutrition, 6700AN Wageningen, The Netherlands. r.siezen@cmbi.ru.nl

AUTOMATICALLY GENERATED EXCERPT
Please rate it.

These days, more and more scientists are diving into genome sequencing projects, urged by fast and cheap next‐generation sequencing technologies... By contrast, manual annotation is costly and time‐consuming... However, manual re‐annotation of genomes can significantly reduce the propagation of annotation errors and thus reduce the time spent on flawed research... Re‐annotations can be published in literature or made available on websites... Examples of published re‐annotated genomes are unfortunately rare compared with the rapidly increasing number of sequenced genomes... Both the KEGG and MetaCyc databases describe the relation of gene products to metabolic pathways... Many of the afore‐mentioned databases contain annotation information that is generated by gene annotation pipelines... On‐line services (IGS, IMG, JCVI, IGS, RAST, xBASE, BASys) have the advantage of simplicity and little time investment... Curation of the annotation results requires constant user interaction to view the genes in context of different annotation information... Assigning genes to metabolic pathways can be done using the KAAS service (Table 3), which annotates gene products by assigning EC numbers based on amino acid similarity to gene products with known EC numbers... Once gene annotations have been determined, they can be checked for inaccurate or missing gene annotations using MICheck. ) describe an algorithm for policing gene annotations, which looks for genes with poor genomic correlations with their network neighbours, and are likely to represent annotation errors... Each service provided multiple unique start sites and gene product calls as well as mistakes... They argue that the most efficient way to substantially decrease annotation error is to compare results from multiple annotation services... Although wikis will not (and should not) supplant well‐curated model‐organism databases, for the majority of species they might represent our best chance for creating accurate, up‐to‐date genome annotation... And if you are really serious about updating your annotations, don't forget to re‐sequence your original strains using next‐generation sequencing, at least if you can still find them in your freezer!

Show MeSH
Simplified prokaryotic genome database (PkGDB) relational model composed of three main components: sequence and annotation data (in green), annotation management (in blue) and functional predictions (in purple). Sequences and annotations come from public databanks, sequencing centres and specialized databases focused on model organisms. For genomes of interest, a (re)‐annotation process is performed using AMIGene (Bocs et al., 2003) and leads to the creation of new ‘Genomic Objects’. Each ‘Genomic Object’ and associated functional prediction results are stored in the PkGDB. The database architecture supports integration of automatic and manual annotations, and management of a history of annotations and sequence updates. Reproduced from Vallenet and colleagues (2006).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3815804&req=5

f2: Simplified prokaryotic genome database (PkGDB) relational model composed of three main components: sequence and annotation data (in green), annotation management (in blue) and functional predictions (in purple). Sequences and annotations come from public databanks, sequencing centres and specialized databases focused on model organisms. For genomes of interest, a (re)‐annotation process is performed using AMIGene (Bocs et al., 2003) and leads to the creation of new ‘Genomic Objects’. Each ‘Genomic Object’ and associated functional prediction results are stored in the PkGDB. The database architecture supports integration of automatic and manual annotations, and management of a history of annotations and sequence updates. Reproduced from Vallenet and colleagues (2006).

Mentions: Many of the afore‐mentioned databases contain annotation information that is generated by gene annotation pipelines. Table 3 lists annotation pipelines that are either offered as a service or that can be downloaded and installed locally. Locally running pipelines (AGMIAL, DIYA, Restauro‐G, GenVar, SABIA, MAGPIE and GenDB) have the advantage that data can be kept confidential and that the annotation process is run on local hardware, ensuring reproducible annotation times. On‐line services (IGS, IMG, JCVI, IGS, RAST, xBASE, BASys) have the advantage of simplicity and little time investment. Curation of the annotation results requires constant user interaction to view the genes in context of different annotation information. The JCVI and IGS services both use the (formerly known as TIGR) Manatee pipeline, which also uses the TIGRFAMs to detect functional domains in protein sequences. They offer the user the possibility to view and alter annotations in the respective databases they use. Similar functionality is offered by MAGE (which uses the MicroScope database) (Fig. 2), IMG‐ER (uses the IMG data model as basis) and RAST (based on the Seed). The commercially available Pedant‐Pro pipeline is based on the Pedant annotation pipeline with various enhancements. Usability of the MiGAP and ATCUG annotation pipelines could not be judged by us due to unavailable software (ATCUG) or website language in Japanese (MiGAP). The Taverna work‐flow system allows to link different web services, and has the advantage that it can be adapted by experienced bioinformaticians. Assigning genes to metabolic pathways can be done using the KAAS service (Table 3), which annotates gene products by assigning EC numbers based on amino acid similarity to gene products with known EC numbers.


Genome (re-)annotation and open-source annotation pipelines.

Siezen RJ, van Hijum SA - Microb Biotechnol (2010)

Simplified prokaryotic genome database (PkGDB) relational model composed of three main components: sequence and annotation data (in green), annotation management (in blue) and functional predictions (in purple). Sequences and annotations come from public databanks, sequencing centres and specialized databases focused on model organisms. For genomes of interest, a (re)‐annotation process is performed using AMIGene (Bocs et al., 2003) and leads to the creation of new ‘Genomic Objects’. Each ‘Genomic Object’ and associated functional prediction results are stored in the PkGDB. The database architecture supports integration of automatic and manual annotations, and management of a history of annotations and sequence updates. Reproduced from Vallenet and colleagues (2006).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3815804&req=5

f2: Simplified prokaryotic genome database (PkGDB) relational model composed of three main components: sequence and annotation data (in green), annotation management (in blue) and functional predictions (in purple). Sequences and annotations come from public databanks, sequencing centres and specialized databases focused on model organisms. For genomes of interest, a (re)‐annotation process is performed using AMIGene (Bocs et al., 2003) and leads to the creation of new ‘Genomic Objects’. Each ‘Genomic Object’ and associated functional prediction results are stored in the PkGDB. The database architecture supports integration of automatic and manual annotations, and management of a history of annotations and sequence updates. Reproduced from Vallenet and colleagues (2006).
Mentions: Many of the afore‐mentioned databases contain annotation information that is generated by gene annotation pipelines. Table 3 lists annotation pipelines that are either offered as a service or that can be downloaded and installed locally. Locally running pipelines (AGMIAL, DIYA, Restauro‐G, GenVar, SABIA, MAGPIE and GenDB) have the advantage that data can be kept confidential and that the annotation process is run on local hardware, ensuring reproducible annotation times. On‐line services (IGS, IMG, JCVI, IGS, RAST, xBASE, BASys) have the advantage of simplicity and little time investment. Curation of the annotation results requires constant user interaction to view the genes in context of different annotation information. The JCVI and IGS services both use the (formerly known as TIGR) Manatee pipeline, which also uses the TIGRFAMs to detect functional domains in protein sequences. They offer the user the possibility to view and alter annotations in the respective databases they use. Similar functionality is offered by MAGE (which uses the MicroScope database) (Fig. 2), IMG‐ER (uses the IMG data model as basis) and RAST (based on the Seed). The commercially available Pedant‐Pro pipeline is based on the Pedant annotation pipeline with various enhancements. Usability of the MiGAP and ATCUG annotation pipelines could not be judged by us due to unavailable software (ATCUG) or website language in Japanese (MiGAP). The Taverna work‐flow system allows to link different web services, and has the advantage that it can be adapted by experienced bioinformaticians. Assigning genes to metabolic pathways can be done using the KAAS service (Table 3), which annotates gene products by assigning EC numbers based on amino acid similarity to gene products with known EC numbers.

View Article: PubMed Central - PubMed

Affiliation: Kluyver Centre for Genomics of Industrial Fermentation, TI Food and Nutrition, 6700AN Wageningen, The Netherlands. r.siezen@cmbi.ru.nl

AUTOMATICALLY GENERATED EXCERPT
Please rate it.

These days, more and more scientists are diving into genome sequencing projects, urged by fast and cheap next‐generation sequencing technologies... By contrast, manual annotation is costly and time‐consuming... However, manual re‐annotation of genomes can significantly reduce the propagation of annotation errors and thus reduce the time spent on flawed research... Re‐annotations can be published in literature or made available on websites... Examples of published re‐annotated genomes are unfortunately rare compared with the rapidly increasing number of sequenced genomes... Both the KEGG and MetaCyc databases describe the relation of gene products to metabolic pathways... Many of the afore‐mentioned databases contain annotation information that is generated by gene annotation pipelines... On‐line services (IGS, IMG, JCVI, IGS, RAST, xBASE, BASys) have the advantage of simplicity and little time investment... Curation of the annotation results requires constant user interaction to view the genes in context of different annotation information... Assigning genes to metabolic pathways can be done using the KAAS service (Table 3), which annotates gene products by assigning EC numbers based on amino acid similarity to gene products with known EC numbers... Once gene annotations have been determined, they can be checked for inaccurate or missing gene annotations using MICheck. ) describe an algorithm for policing gene annotations, which looks for genes with poor genomic correlations with their network neighbours, and are likely to represent annotation errors... Each service provided multiple unique start sites and gene product calls as well as mistakes... They argue that the most efficient way to substantially decrease annotation error is to compare results from multiple annotation services... Although wikis will not (and should not) supplant well‐curated model‐organism databases, for the majority of species they might represent our best chance for creating accurate, up‐to‐date genome annotation... And if you are really serious about updating your annotations, don't forget to re‐sequence your original strains using next‐generation sequencing, at least if you can still find them in your freezer!

Show MeSH