Limits...
Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance.

Zhang Z, Li J, Cui P, Ding F, Li A, Townsend JP, Yu J - BMC Bioinformatics (2012)

Bottom Line: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge.We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Bioscience Research Center (CBRC), King Abdullah Universitof Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.

ABSTRACT

Background: Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB). Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis.

Results: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.

Conclusions: As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.

Show MeSH
Codon usage bias across a range of sequence lengths. Sequences were simulated with the four non-uniform positional composition sets: Low (panel A), Med-1 (panel B), Med-2 (panel C) and High (panel D). Each estimate was determined based on 10000 replicate simulated sequences. The expected values of codon usage bias are zero for all examined cases.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3368730&req=5

Figure 2: Codon usage bias across a range of sequence lengths. Sequences were simulated with the four non-uniform positional composition sets: Low (panel A), Med-1 (panel B), Med-2 (panel C) and High (panel D). Each estimate was determined based on 10000 replicate simulated sequences. The expected values of codon usage bias are zero for all examined cases.

Mentions: To examine the effect of variable sequence length on the integrity of CDC, we considered a wide range of sequence lengths from 100 to 3,000 codons. We set both GC and purine contents to be heterogeneous at three codon position using the four non-uniform PCSs (Table 1). To avoid stochastic errors, we repeated simulations 10,000 times for each parameter combination and thus each estimate was determined from 10,000 replicates. Overall, CDC performed better than Nc' and Nc across all sequence lengths examined (Figure 2). When the heterogeneity of BNC increased from low to high, CDC tended to have less biases, whereas Nc' and Nc produced increasingly biased estimates, especially for the case where there was high heterogeneity in positional BNCs (Figure 2D). For short sequences (<300 codons), CDC yielded much lower biases and smaller standard deviations (SD) than Nc' and Nc, although all three measures produced estimates that are somewhat biased. To obtain more reliable estimates of CUB, our results suggest that input sequences should have at least 100 codons in length. When sequence length was decreased below 100 codons, CDC still performed better than Nc' and Nc, although the biases of Nc' and Nc were in opposite directions as compared with those of CDC (Figure 2B to 2D; not apparent in Figure 2A). For long sequences, CDC generated less biased estimates and SDs, whereas Nc' and Nc continued to yield more biased estimates and SDs.


Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance.

Zhang Z, Li J, Cui P, Ding F, Li A, Townsend JP, Yu J - BMC Bioinformatics (2012)

Codon usage bias across a range of sequence lengths. Sequences were simulated with the four non-uniform positional composition sets: Low (panel A), Med-1 (panel B), Med-2 (panel C) and High (panel D). Each estimate was determined based on 10000 replicate simulated sequences. The expected values of codon usage bias are zero for all examined cases.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3368730&req=5

Figure 2: Codon usage bias across a range of sequence lengths. Sequences were simulated with the four non-uniform positional composition sets: Low (panel A), Med-1 (panel B), Med-2 (panel C) and High (panel D). Each estimate was determined based on 10000 replicate simulated sequences. The expected values of codon usage bias are zero for all examined cases.
Mentions: To examine the effect of variable sequence length on the integrity of CDC, we considered a wide range of sequence lengths from 100 to 3,000 codons. We set both GC and purine contents to be heterogeneous at three codon position using the four non-uniform PCSs (Table 1). To avoid stochastic errors, we repeated simulations 10,000 times for each parameter combination and thus each estimate was determined from 10,000 replicates. Overall, CDC performed better than Nc' and Nc across all sequence lengths examined (Figure 2). When the heterogeneity of BNC increased from low to high, CDC tended to have less biases, whereas Nc' and Nc produced increasingly biased estimates, especially for the case where there was high heterogeneity in positional BNCs (Figure 2D). For short sequences (<300 codons), CDC yielded much lower biases and smaller standard deviations (SD) than Nc' and Nc, although all three measures produced estimates that are somewhat biased. To obtain more reliable estimates of CUB, our results suggest that input sequences should have at least 100 codons in length. When sequence length was decreased below 100 codons, CDC still performed better than Nc' and Nc, although the biases of Nc' and Nc were in opposite directions as compared with those of CDC (Figure 2B to 2D; not apparent in Figure 2A). For long sequences, CDC generated less biased estimates and SDs, whereas Nc' and Nc continued to yield more biased estimates and SDs.

Bottom Line: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge.We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Bioscience Research Center (CBRC), King Abdullah Universitof Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.

ABSTRACT

Background: Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB). Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis.

Results: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.

Conclusions: As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.

Show MeSH