Limits...
Quantitative frame analysis and the annotation of GC-rich (and other) prokaryotic genomes. An application to Anaeromyxobacter dehalogenans.

Oden S, Brocchieri L - Bioinformatics (2015)

Bottom Line: Graphical representations of contrasts in GC usage among codon frame positions (frame analysis) provide evidence of genes missing from the annotations of prokaryotic genomes of high GC content but the qualitative approach of visual frame analysis prevents its applicability on a genomic scale.We developed two quantitative methods for the identification and statistical characterization in sequence regions of three-base periodicity (hits) associated with open reading frame structures.We applied the NPACT procedures to two recently annotated strains of the deltaproteobacterium Anaeromyxobacter dehalogenans, identifying in both genomes numerous conserved ORFs not included in the published annotation of coding regions.

View Article: PubMed Central - PubMed

Affiliation: Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32610, USA and Genetics Institute, University of Florida, Gainesville, FL 32610, USA.

No MeSH data available.


Related in: MedlinePlus

Frequency of genes with H-type or G-type hits (p≤10−2) in sets of genes annotated in about 1000 published prokaryotic genome sequences. (A) Frequency within the sets of functionally described genes (‘Characterized’) and of ‘Hypothetical’ genes for different classes of sequence length. (B) Frequency by GC content within the Characterized and Hypothetical sets of genes. (C) Frequency within the Characterized set of genes, partitioned by sequence length and GC content
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4595893&req=5

btv339-F3: Frequency of genes with H-type or G-type hits (p≤10−2) in sets of genes annotated in about 1000 published prokaryotic genome sequences. (A) Frequency within the sets of functionally described genes (‘Characterized’) and of ‘Hypothetical’ genes for different classes of sequence length. (B) Frequency by GC content within the Characterized and Hypothetical sets of genes. (C) Frequency within the Characterized set of genes, partitioned by sequence length and GC content

Mentions: Partitioning the Characterized and Hypothetical datasets into classes of different length and GC-content (Supplementary Table S5), we found as expected that the frequency of genes with significant three-base periodicities increased with sequence length (Fig. 3A). Compositional periodicities were identified in the vast majority (>99%) of Characterized genes with length ≥ 600 codons, and in > 90% of the sequences of length ≥ 250 codons at the significance level α=10−2. Significant periodicities were still observed in the majority (60.6%) of the sequences in the length range 50–99 codons and in almost one-third (31.9%) of sequences shorter than 50 codons. Mirroring the overall result, we found a lower frequency of hits among Hypothetical sequences than among those Characterized (Fig. 3A), demonstrating that the lower frequency of periodic ORFs in the Hypothetical set did not depend on the average shorter length of hypothetical genes (Table 1).Fig. 3.


Quantitative frame analysis and the annotation of GC-rich (and other) prokaryotic genomes. An application to Anaeromyxobacter dehalogenans.

Oden S, Brocchieri L - Bioinformatics (2015)

Frequency of genes with H-type or G-type hits (p≤10−2) in sets of genes annotated in about 1000 published prokaryotic genome sequences. (A) Frequency within the sets of functionally described genes (‘Characterized’) and of ‘Hypothetical’ genes for different classes of sequence length. (B) Frequency by GC content within the Characterized and Hypothetical sets of genes. (C) Frequency within the Characterized set of genes, partitioned by sequence length and GC content
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4595893&req=5

btv339-F3: Frequency of genes with H-type or G-type hits (p≤10−2) in sets of genes annotated in about 1000 published prokaryotic genome sequences. (A) Frequency within the sets of functionally described genes (‘Characterized’) and of ‘Hypothetical’ genes for different classes of sequence length. (B) Frequency by GC content within the Characterized and Hypothetical sets of genes. (C) Frequency within the Characterized set of genes, partitioned by sequence length and GC content
Mentions: Partitioning the Characterized and Hypothetical datasets into classes of different length and GC-content (Supplementary Table S5), we found as expected that the frequency of genes with significant three-base periodicities increased with sequence length (Fig. 3A). Compositional periodicities were identified in the vast majority (>99%) of Characterized genes with length ≥ 600 codons, and in > 90% of the sequences of length ≥ 250 codons at the significance level α=10−2. Significant periodicities were still observed in the majority (60.6%) of the sequences in the length range 50–99 codons and in almost one-third (31.9%) of sequences shorter than 50 codons. Mirroring the overall result, we found a lower frequency of hits among Hypothetical sequences than among those Characterized (Fig. 3A), demonstrating that the lower frequency of periodic ORFs in the Hypothetical set did not depend on the average shorter length of hypothetical genes (Table 1).Fig. 3.

Bottom Line: Graphical representations of contrasts in GC usage among codon frame positions (frame analysis) provide evidence of genes missing from the annotations of prokaryotic genomes of high GC content but the qualitative approach of visual frame analysis prevents its applicability on a genomic scale.We developed two quantitative methods for the identification and statistical characterization in sequence regions of three-base periodicity (hits) associated with open reading frame structures.We applied the NPACT procedures to two recently annotated strains of the deltaproteobacterium Anaeromyxobacter dehalogenans, identifying in both genomes numerous conserved ORFs not included in the published annotation of coding regions.

View Article: PubMed Central - PubMed

Affiliation: Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32610, USA and Genetics Institute, University of Florida, Gainesville, FL 32610, USA.

No MeSH data available.


Related in: MedlinePlus