Limits...
Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance.

Zhang Z, Li J, Cui P, Ding F, Li A, Townsend JP, Yu J - BMC Bioinformatics (2012)

Bottom Line: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge.We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Bioscience Research Center (CBRC), King Abdullah Universitof Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.

ABSTRACT

Background: Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB). Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis.

Results: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.

Conclusions: As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.

Show MeSH
Heterogeneity of positional background nucleotide compositions in E. coli (2,766 genes in M9 medium), S. cerevisiae (5,142 genes), D. melanogaster (1,651 genes),C. elegans (12,184 genes), and A. thaliana (1,332 genes). Heterogeneities of positional GC contents are represented by absolute differences between overall GC content and its positional contents: GC-GC1 for the first position (panel A), GC-GC2 for the second position (panel B), and GC-GC3 for the third position (panel C), respectively. Likewise, heterogeneities of positional purine content are absolute differences between overall purine (AG) content and its positional contents: AG-AG1 for the first position (panel D), AG-AG2 for the second position (panel E), and AG-AG3 for the third position (panel F), respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3368730&req=5

Figure 3: Heterogeneity of positional background nucleotide compositions in E. coli (2,766 genes in M9 medium), S. cerevisiae (5,142 genes), D. melanogaster (1,651 genes),C. elegans (12,184 genes), and A. thaliana (1,332 genes). Heterogeneities of positional GC contents are represented by absolute differences between overall GC content and its positional contents: GC-GC1 for the first position (panel A), GC-GC2 for the second position (panel B), and GC-GC3 for the third position (panel C), respectively. Likewise, heterogeneities of positional purine content are absolute differences between overall purine (AG) content and its positional contents: AG-AG1 for the first position (panel D), AG-AG2 for the second position (panel E), and AG-AG3 for the third position (panel F), respectively.

Mentions: As noted, the correlation coefficients produced by CDC and scaled Nc' were similar in yeast but different in others (Table 6). Since CDC takes positional GC and purine contents as BNC and Nc' considers only GC content as BNC and ignores positional heterogeneity, this result can be probably explained by relatively lower heterogeneity of positional BNCs in yeast. To further investigate this possibility, we examined the heterogeneities of positional GC and purine contents in these five species (Figure 3). Consistent with our expectation, heterogeneities of positional GC contents were indeed lower in yeast by comparison with other species (Figure 3A to 3C), especially at the second and third codon positions. In contrast, higher heterogeneities of positional GC contents were apparent in E. coli (Figure 3A and 3B for the first and second codon positions, respectively) and D. melanogaster (Figure 3B and 3C for the second and third codon positions, respectively). These results agree well with the observation that the difference of correlation coefficient between CDC and scaled Nc' in yeast was smaller than that in E. coli or D. melanogaster (Table 6). As a consequence, CDC correlated more closely with scaled Nc' in yeast than in E. coli or D. melanogaster (Figure S13 in Additional file 1). In contrast to GC content, heterogeneities of positional purine contents were relatively smaller and similar among the five species tested, presumably attributable to the fact that GC content ranges more broadly (20%--80%) than purine content (40%--60%) [48,58,59].


Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance.

Zhang Z, Li J, Cui P, Ding F, Li A, Townsend JP, Yu J - BMC Bioinformatics (2012)

Heterogeneity of positional background nucleotide compositions in E. coli (2,766 genes in M9 medium), S. cerevisiae (5,142 genes), D. melanogaster (1,651 genes),C. elegans (12,184 genes), and A. thaliana (1,332 genes). Heterogeneities of positional GC contents are represented by absolute differences between overall GC content and its positional contents: GC-GC1 for the first position (panel A), GC-GC2 for the second position (panel B), and GC-GC3 for the third position (panel C), respectively. Likewise, heterogeneities of positional purine content are absolute differences between overall purine (AG) content and its positional contents: AG-AG1 for the first position (panel D), AG-AG2 for the second position (panel E), and AG-AG3 for the third position (panel F), respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3368730&req=5

Figure 3: Heterogeneity of positional background nucleotide compositions in E. coli (2,766 genes in M9 medium), S. cerevisiae (5,142 genes), D. melanogaster (1,651 genes),C. elegans (12,184 genes), and A. thaliana (1,332 genes). Heterogeneities of positional GC contents are represented by absolute differences between overall GC content and its positional contents: GC-GC1 for the first position (panel A), GC-GC2 for the second position (panel B), and GC-GC3 for the third position (panel C), respectively. Likewise, heterogeneities of positional purine content are absolute differences between overall purine (AG) content and its positional contents: AG-AG1 for the first position (panel D), AG-AG2 for the second position (panel E), and AG-AG3 for the third position (panel F), respectively.
Mentions: As noted, the correlation coefficients produced by CDC and scaled Nc' were similar in yeast but different in others (Table 6). Since CDC takes positional GC and purine contents as BNC and Nc' considers only GC content as BNC and ignores positional heterogeneity, this result can be probably explained by relatively lower heterogeneity of positional BNCs in yeast. To further investigate this possibility, we examined the heterogeneities of positional GC and purine contents in these five species (Figure 3). Consistent with our expectation, heterogeneities of positional GC contents were indeed lower in yeast by comparison with other species (Figure 3A to 3C), especially at the second and third codon positions. In contrast, higher heterogeneities of positional GC contents were apparent in E. coli (Figure 3A and 3B for the first and second codon positions, respectively) and D. melanogaster (Figure 3B and 3C for the second and third codon positions, respectively). These results agree well with the observation that the difference of correlation coefficient between CDC and scaled Nc' in yeast was smaller than that in E. coli or D. melanogaster (Table 6). As a consequence, CDC correlated more closely with scaled Nc' in yeast than in E. coli or D. melanogaster (Figure S13 in Additional file 1). In contrast to GC content, heterogeneities of positional purine contents were relatively smaller and similar among the five species tested, presumably attributable to the fact that GC content ranges more broadly (20%--80%) than purine content (40%--60%) [48,58,59].

Bottom Line: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge.We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Bioscience Research Center (CBRC), King Abdullah Universitof Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.

ABSTRACT

Background: Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB). Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis.

Results: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.

Conclusions: As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.

Show MeSH