Limits...
Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance.

Zhang Z, Li J, Cui P, Ding F, Li A, Townsend JP, Yu J - BMC Bioinformatics (2012)

Bottom Line: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge.We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Bioscience Research Center (CBRC), King Abdullah Universitof Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.

ABSTRACT

Background: Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB). Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis.

Results: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.

Conclusions: As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.

Show MeSH
Codon usage bias across a variety of positional background nucleotide compositions. Heterogeneous positional background compositions were considered for GC content (panels A to C) and purine content (panels D to E), respectively. The expected values of codon usage bias are zero for all examined cases.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3368730&req=5

Figure 1: Codon usage bias across a variety of positional background nucleotide compositions. Heterogeneous positional background compositions were considered for GC content (panels A to C) and purine content (panels D to E), respectively. The expected values of codon usage bias are zero for all examined cases.

Mentions: A good measure should not deviate much from its expectation as the amount of data approaches infinity or any sufficiently large number. Thus, we first simulated sequences with a total of 100,000 codons using five positional composition sets (PCSs) (Table 1). Considering the fact that both GC and purine contents govern BNC, we fixed one of them to be uniform at three codon positions and allowed the other to have various positional compositions. We examined heterogeneous positional compositions for GC (Figure 1A to 1C) and purine (Figure 1D to 1F) contents, respectively. Consistent with expectations, when the PCS was uniform, CDC and scaled Nc' performed similarly, both taking a value close to 0 (Figure 1). When the heterogeneity of positional composition increased for GC content (Figure 1A to 1C), CDC continued to perform well for all cases examined, whereas scaled Nc' and scaled Nc generated biased estimates, especially in cases where there was high heterogeneity in positional BNCs. Similarly, when purine content had heterogeneous positional compositions (Figure 1D to 1F), CDC again exhibited much lower biases than scaled Nc' and scaled Nc. Since Nc ignores BNC, Nc' performed better than Nc when the PCS was non-uniform (Figure 1A, C, D and 1F) and they exhibited comparable estimates only in cases where the PCS was uniform (Figure 1B and 1E). These results agree well with those of Novembre [19]. In addition, when we set heterogeneous positional BNCs for both GC and purine contents, CDC consistently outperformed Nc' and Nc for nearly all the parameter combinations tested (Table 2).


Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance.

Zhang Z, Li J, Cui P, Ding F, Li A, Townsend JP, Yu J - BMC Bioinformatics (2012)

Codon usage bias across a variety of positional background nucleotide compositions. Heterogeneous positional background compositions were considered for GC content (panels A to C) and purine content (panels D to E), respectively. The expected values of codon usage bias are zero for all examined cases.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3368730&req=5

Figure 1: Codon usage bias across a variety of positional background nucleotide compositions. Heterogeneous positional background compositions were considered for GC content (panels A to C) and purine content (panels D to E), respectively. The expected values of codon usage bias are zero for all examined cases.
Mentions: A good measure should not deviate much from its expectation as the amount of data approaches infinity or any sufficiently large number. Thus, we first simulated sequences with a total of 100,000 codons using five positional composition sets (PCSs) (Table 1). Considering the fact that both GC and purine contents govern BNC, we fixed one of them to be uniform at three codon positions and allowed the other to have various positional compositions. We examined heterogeneous positional compositions for GC (Figure 1A to 1C) and purine (Figure 1D to 1F) contents, respectively. Consistent with expectations, when the PCS was uniform, CDC and scaled Nc' performed similarly, both taking a value close to 0 (Figure 1). When the heterogeneity of positional composition increased for GC content (Figure 1A to 1C), CDC continued to perform well for all cases examined, whereas scaled Nc' and scaled Nc generated biased estimates, especially in cases where there was high heterogeneity in positional BNCs. Similarly, when purine content had heterogeneous positional compositions (Figure 1D to 1F), CDC again exhibited much lower biases than scaled Nc' and scaled Nc. Since Nc ignores BNC, Nc' performed better than Nc when the PCS was non-uniform (Figure 1A, C, D and 1F) and they exhibited comparable estimates only in cases where the PCS was uniform (Figure 1B and 1E). These results agree well with those of Novembre [19]. In addition, when we set heterogeneous positional BNCs for both GC and purine contents, CDC consistently outperformed Nc' and Nc for nearly all the parameter combinations tested (Table 2).

Bottom Line: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge.We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Bioscience Research Center (CBRC), King Abdullah Universitof Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.

ABSTRACT

Background: Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB). Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis.

Results: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.

Conclusions: As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.

Show MeSH