Limits...
Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance.

Zhang Z, Li J, Cui P, Ding F, Li A, Townsend JP, Yu J - BMC Bioinformatics (2012)

Bottom Line: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge.We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Bioscience Research Center (CBRC), King Abdullah Universitof Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.

ABSTRACT

Background: Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB). Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis.

Results: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.

Conclusions: As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.

Show MeSH
Comparison of CDC distributions between ribosomal protein (54 RP genes vary from 0.244 to 0.481) genes and all genes (4,144 genes range from 0.046 to 0.550) in E. coli.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3368730&req=5

Figure 4: Comparison of CDC distributions between ribosomal protein (54 RP genes vary from 0.244 to 0.481) genes and all genes (4,144 genes range from 0.046 to 0.550) in E. coli.

Mentions: We proceeded to calculate CDC values (as well as GC and purine contents) for all E. coli genes (Additional file 2). CDC values ranged from 0.046 to 0.550 and the mean and median values were 0.239 and 0.187, respectively (Figure 4). The majority of genes (69%) exhibited CDC values between 0.15 and 0.25. The gene with the highest CDC value is trpL, a key component in the attenuation system that controls the expression of the trpLEDCBA operon in response to tryptophan availability [60]. However, bootstrap resampling illustrates that the CUB value of trpL gene is not statistically significant (P = 0.77), most likely due to its short length (14 aa), consistent with our simulation results that short sequences tend to have biased CUB estimates. The gene with the highest CDC value and statistical significance in CUB is rpmI (CDC = 0.481), which encodes ribosomal protein L35. In contrast, scaled Nc' and scaled Nc identified rplL (encoding the ribosomal protein L7/L12) and eno (catalyzing the interconversion of 2-phosphoglycerate and phosphoenolpyruvate) genes, respectively, as having the strongest CUBs (Additional file 2).


Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance.

Zhang Z, Li J, Cui P, Ding F, Li A, Townsend JP, Yu J - BMC Bioinformatics (2012)

Comparison of CDC distributions between ribosomal protein (54 RP genes vary from 0.244 to 0.481) genes and all genes (4,144 genes range from 0.046 to 0.550) in E. coli.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3368730&req=5

Figure 4: Comparison of CDC distributions between ribosomal protein (54 RP genes vary from 0.244 to 0.481) genes and all genes (4,144 genes range from 0.046 to 0.550) in E. coli.
Mentions: We proceeded to calculate CDC values (as well as GC and purine contents) for all E. coli genes (Additional file 2). CDC values ranged from 0.046 to 0.550 and the mean and median values were 0.239 and 0.187, respectively (Figure 4). The majority of genes (69%) exhibited CDC values between 0.15 and 0.25. The gene with the highest CDC value is trpL, a key component in the attenuation system that controls the expression of the trpLEDCBA operon in response to tryptophan availability [60]. However, bootstrap resampling illustrates that the CUB value of trpL gene is not statistically significant (P = 0.77), most likely due to its short length (14 aa), consistent with our simulation results that short sequences tend to have biased CUB estimates. The gene with the highest CDC value and statistical significance in CUB is rpmI (CDC = 0.481), which encodes ribosomal protein L35. In contrast, scaled Nc' and scaled Nc identified rplL (encoding the ribosomal protein L7/L12) and eno (catalyzing the interconversion of 2-phosphoglycerate and phosphoenolpyruvate) genes, respectively, as having the strongest CUBs (Additional file 2).

Bottom Line: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge.We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Bioscience Research Center (CBRC), King Abdullah Universitof Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.

ABSTRACT

Background: Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB). Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis.

Results: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.

Conclusions: As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.

Show MeSH