Limits...
MetaGene: prokaryotic gene finding from environmental genome shotgun sequences.

Noguchi H, Park J, Takagi T - Nucleic Acids Res. (2006)

Bottom Line: The domain classification works properly, correctly assigning domain information to more than 90% of the artificial shotgun sequences.Applied to the Sargasso Sea dataset, MetaGene predicted almost all of the annotated genes and a notable number of novel genes.MetaGene can be applied to wide variety of metagenomic projects and expands the utility of metagenomics.

View Article: PubMed Central - PubMed

Affiliation: Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Chiba 277-8562, Japan. hide@cb.k.u-tokyo.ac.jp

ABSTRACT
Exhaustive gene identification is a fundamental goal in all metagenomics projects. However, most metagenomic sequences are unassembled anonymous fragments, and conventional gene-finding methods cannot be applied. We have developed a prokaryotic gene-finding program, MetaGene, which utilizes di-codon frequencies estimated by the GC content of a given sequence with other various measures. MetaGene can predict a whole range of prokaryotic genes based on the anonymous genomic sequences of a few hundred bases, with a sensitivity of 95% and a specificity of 90% for artificial shotgun sequences (700 bp fragments from 12 species). MetaGene has two sets of codon frequency interpolations, one for bacteria and one for archaea, and automatically selects the proper set for a given sequence using the domain classification method we propose. The domain classification works properly, correctly assigning domain information to more than 90% of the artificial shotgun sequences. Applied to the Sargasso Sea dataset, MetaGene predicted almost all of the annotated genes and a notable number of novel genes. MetaGene can be applied to wide variety of metagenomic projects and expands the utility of metagenomics.

Show MeSH
Sensitivity and specificity of MetaGene for the sets of fixed-length artificial shotgun sequences. The average values for 12 species are indicated.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC1636498&req=5

fig3: Sensitivity and specificity of MetaGene for the sets of fixed-length artificial shotgun sequences. The average values for 12 species are indicated.

Mentions: Raw shotgun sequences vary in length, although the average is about 700 bp. We applied MetaGene to various fixed-length fragments ranging from 100 to 1000 bases (1× genome) and inspected the change of the prediction performance with the length of the input sequence (Figure 3). The prediction accuracies naturally decreased along with the shortening of the input sequences. However, MetaGene retained relatively high accuracies on smaller fragments, and extreme degradation of accuracy was observed only for the 100 bp fragments. Generally, most raw sequence reads are larger than 500–600 bp, which is to say that MetaGene can predict genes on the metagenomic data with high reliability.


MetaGene: prokaryotic gene finding from environmental genome shotgun sequences.

Noguchi H, Park J, Takagi T - Nucleic Acids Res. (2006)

Sensitivity and specificity of MetaGene for the sets of fixed-length artificial shotgun sequences. The average values for 12 species are indicated.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC1636498&req=5

fig3: Sensitivity and specificity of MetaGene for the sets of fixed-length artificial shotgun sequences. The average values for 12 species are indicated.
Mentions: Raw shotgun sequences vary in length, although the average is about 700 bp. We applied MetaGene to various fixed-length fragments ranging from 100 to 1000 bases (1× genome) and inspected the change of the prediction performance with the length of the input sequence (Figure 3). The prediction accuracies naturally decreased along with the shortening of the input sequences. However, MetaGene retained relatively high accuracies on smaller fragments, and extreme degradation of accuracy was observed only for the 100 bp fragments. Generally, most raw sequence reads are larger than 500–600 bp, which is to say that MetaGene can predict genes on the metagenomic data with high reliability.

Bottom Line: The domain classification works properly, correctly assigning domain information to more than 90% of the artificial shotgun sequences.Applied to the Sargasso Sea dataset, MetaGene predicted almost all of the annotated genes and a notable number of novel genes.MetaGene can be applied to wide variety of metagenomic projects and expands the utility of metagenomics.

View Article: PubMed Central - PubMed

Affiliation: Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Chiba 277-8562, Japan. hide@cb.k.u-tokyo.ac.jp

ABSTRACT
Exhaustive gene identification is a fundamental goal in all metagenomics projects. However, most metagenomic sequences are unassembled anonymous fragments, and conventional gene-finding methods cannot be applied. We have developed a prokaryotic gene-finding program, MetaGene, which utilizes di-codon frequencies estimated by the GC content of a given sequence with other various measures. MetaGene can predict a whole range of prokaryotic genes based on the anonymous genomic sequences of a few hundred bases, with a sensitivity of 95% and a specificity of 90% for artificial shotgun sequences (700 bp fragments from 12 species). MetaGene has two sets of codon frequency interpolations, one for bacteria and one for archaea, and automatically selects the proper set for a given sequence using the domain classification method we propose. The domain classification works properly, correctly assigning domain information to more than 90% of the artificial shotgun sequences. Applied to the Sargasso Sea dataset, MetaGene predicted almost all of the annotated genes and a notable number of novel genes. MetaGene can be applied to wide variety of metagenomic projects and expands the utility of metagenomics.

Show MeSH