Limits...
Genome (re-)annotation and open-source annotation pipelines.

Siezen RJ, van Hijum SA - Microb Biotechnol (2010)

View Article: PubMed Central - PubMed

Affiliation: Kluyver Centre for Genomics of Industrial Fermentation, TI Food and Nutrition, 6700AN Wageningen, The Netherlands. r.siezen@cmbi.ru.nl

AUTOMATICALLY GENERATED EXCERPT
Please rate it.

These days, more and more scientists are diving into genome sequencing projects, urged by fast and cheap next‐generation sequencing technologies... By contrast, manual annotation is costly and time‐consuming... However, manual re‐annotation of genomes can significantly reduce the propagation of annotation errors and thus reduce the time spent on flawed research... Re‐annotations can be published in literature or made available on websites... Examples of published re‐annotated genomes are unfortunately rare compared with the rapidly increasing number of sequenced genomes... Both the KEGG and MetaCyc databases describe the relation of gene products to metabolic pathways... Many of the afore‐mentioned databases contain annotation information that is generated by gene annotation pipelines... On‐line services (IGS, IMG, JCVI, IGS, RAST, xBASE, BASys) have the advantage of simplicity and little time investment... Curation of the annotation results requires constant user interaction to view the genes in context of different annotation information... Assigning genes to metabolic pathways can be done using the KAAS service (Table 3), which annotates gene products by assigning EC numbers based on amino acid similarity to gene products with known EC numbers... Once gene annotations have been determined, they can be checked for inaccurate or missing gene annotations using MICheck. ) describe an algorithm for policing gene annotations, which looks for genes with poor genomic correlations with their network neighbours, and are likely to represent annotation errors... Each service provided multiple unique start sites and gene product calls as well as mistakes... They argue that the most efficient way to substantially decrease annotation error is to compare results from multiple annotation services... Although wikis will not (and should not) supplant well‐curated model‐organism databases, for the majority of species they might represent our best chance for creating accurate, up‐to‐date genome annotation... And if you are really serious about updating your annotations, don't forget to re‐sequence your original strains using next‐generation sequencing, at least if you can still find them in your freezer!

Show MeSH
A generalised flow chart of genome annotation. Statistical gene prediction: use of methods like GeneMark or Glimmer to predict protein‐coding genes. General database search: searching sequence databases (typically, NCBI NR) for sequence similarity, usually using blast. Specialized database search: searching domain databases (such as Pfam, SMART and CDD), for conserved domains, genome‐oriented databases (such as COGs), for identification of orthologous relationship and refined functional prediction, metabolic databases (such as KEGG) for metabolic pathway reconstruction and other database searches. Prediction of structural features: prediction of signal peptide, transmembrane segments, coiled domain and other features in putative protein functions.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3815804&req=5

f1: A generalised flow chart of genome annotation. Statistical gene prediction: use of methods like GeneMark or Glimmer to predict protein‐coding genes. General database search: searching sequence databases (typically, NCBI NR) for sequence similarity, usually using blast. Specialized database search: searching domain databases (such as Pfam, SMART and CDD), for conserved domains, genome‐oriented databases (such as COGs), for identification of orthologous relationship and refined functional prediction, metabolic databases (such as KEGG) for metabolic pathway reconstruction and other database searches. Prediction of structural features: prediction of signal peptide, transmembrane segments, coiled domain and other features in putative protein functions.

Mentions: Microbial genome annotation involves primarily identifying the genes (or actually the open reading frames: ORFs) encrypted in the DNA sequence and deducing functionality of the encoded protein and RNA products (Fig. 1). First, a gene finder such as Glimmer (Delcher et al., 1999) or GeneMark (Lukashin and Borodovsky, 1998) is applied to the genome DNA sequence, producing a set of predicted protein‐coding genes. These programs are quite accurate, though not perfect. The next step is to take the set of predictions and search for hits against one or more protein and/or protein domain databases using blast (Altschul et al., 1997), HMMer (Eddy, 1998) or other programs. For each gene that has a significant match, the blast output together with the annotation of the hit can be used to assign a name and function to the protein. The accuracy of this step depends not only on the annotation software, but also on the quality of the annotations already in the reference database.


Genome (re-)annotation and open-source annotation pipelines.

Siezen RJ, van Hijum SA - Microb Biotechnol (2010)

A generalised flow chart of genome annotation. Statistical gene prediction: use of methods like GeneMark or Glimmer to predict protein‐coding genes. General database search: searching sequence databases (typically, NCBI NR) for sequence similarity, usually using blast. Specialized database search: searching domain databases (such as Pfam, SMART and CDD), for conserved domains, genome‐oriented databases (such as COGs), for identification of orthologous relationship and refined functional prediction, metabolic databases (such as KEGG) for metabolic pathway reconstruction and other database searches. Prediction of structural features: prediction of signal peptide, transmembrane segments, coiled domain and other features in putative protein functions.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3815804&req=5

f1: A generalised flow chart of genome annotation. Statistical gene prediction: use of methods like GeneMark or Glimmer to predict protein‐coding genes. General database search: searching sequence databases (typically, NCBI NR) for sequence similarity, usually using blast. Specialized database search: searching domain databases (such as Pfam, SMART and CDD), for conserved domains, genome‐oriented databases (such as COGs), for identification of orthologous relationship and refined functional prediction, metabolic databases (such as KEGG) for metabolic pathway reconstruction and other database searches. Prediction of structural features: prediction of signal peptide, transmembrane segments, coiled domain and other features in putative protein functions.
Mentions: Microbial genome annotation involves primarily identifying the genes (or actually the open reading frames: ORFs) encrypted in the DNA sequence and deducing functionality of the encoded protein and RNA products (Fig. 1). First, a gene finder such as Glimmer (Delcher et al., 1999) or GeneMark (Lukashin and Borodovsky, 1998) is applied to the genome DNA sequence, producing a set of predicted protein‐coding genes. These programs are quite accurate, though not perfect. The next step is to take the set of predictions and search for hits against one or more protein and/or protein domain databases using blast (Altschul et al., 1997), HMMer (Eddy, 1998) or other programs. For each gene that has a significant match, the blast output together with the annotation of the hit can be used to assign a name and function to the protein. The accuracy of this step depends not only on the annotation software, but also on the quality of the annotations already in the reference database.

View Article: PubMed Central - PubMed

Affiliation: Kluyver Centre for Genomics of Industrial Fermentation, TI Food and Nutrition, 6700AN Wageningen, The Netherlands. r.siezen@cmbi.ru.nl

AUTOMATICALLY GENERATED EXCERPT
Please rate it.

These days, more and more scientists are diving into genome sequencing projects, urged by fast and cheap next‐generation sequencing technologies... By contrast, manual annotation is costly and time‐consuming... However, manual re‐annotation of genomes can significantly reduce the propagation of annotation errors and thus reduce the time spent on flawed research... Re‐annotations can be published in literature or made available on websites... Examples of published re‐annotated genomes are unfortunately rare compared with the rapidly increasing number of sequenced genomes... Both the KEGG and MetaCyc databases describe the relation of gene products to metabolic pathways... Many of the afore‐mentioned databases contain annotation information that is generated by gene annotation pipelines... On‐line services (IGS, IMG, JCVI, IGS, RAST, xBASE, BASys) have the advantage of simplicity and little time investment... Curation of the annotation results requires constant user interaction to view the genes in context of different annotation information... Assigning genes to metabolic pathways can be done using the KAAS service (Table 3), which annotates gene products by assigning EC numbers based on amino acid similarity to gene products with known EC numbers... Once gene annotations have been determined, they can be checked for inaccurate or missing gene annotations using MICheck. ) describe an algorithm for policing gene annotations, which looks for genes with poor genomic correlations with their network neighbours, and are likely to represent annotation errors... Each service provided multiple unique start sites and gene product calls as well as mistakes... They argue that the most efficient way to substantially decrease annotation error is to compare results from multiple annotation services... Although wikis will not (and should not) supplant well‐curated model‐organism databases, for the majority of species they might represent our best chance for creating accurate, up‐to‐date genome annotation... And if you are really serious about updating your annotations, don't forget to re‐sequence your original strains using next‐generation sequencing, at least if you can still find them in your freezer!

Show MeSH