Limits...
The gene-specific codon counting database: a genome-based catalog of one-, two-, three-, four- and five-codon combinations present in Saccharomyces cerevisiae genes.

Tumu S, Patil A, Towns W, Dyavaiah M, Begley TJ - Database (Oxford) (2012)

Bottom Line: Using our developed Gene-Specific Codon Counting Database, we have identified extreme codon runs in specific genes.We have also demonstrated that specific codon combinations or usage patterns are over-represented in genes whose corresponding proteins belong to ribosome or translation-associated biological processes.Our resulting database provides a mineable list of multi-codon data and can be used to identify unique sequence runs and codon usage patterns in individual and functionally linked groups of genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, University at Albany, State University of New York, Albany, NY 12222, USA.

ABSTRACT
A codon consists of three nucleotides and functions during translation to dictate the insertion of a specific amino acid in a growing peptide or, in the case of stop codons, to specify the completion of protein synthesis. There are 64 possible single codons and there are 4096 double, 262 144 triple, 16 777 216 quadruple and 1 073 741 824 quintuple codon combinations available for use by specific genes and genomes. In order to evaluate the use of specific single, double, triple, quadruple and quintuple codon combinations in genes and gene networks, we have developed a codon counting tool and employed it to analyze 5780 Saccharomyces cerevisiae genes. We have also developed visualization approaches, including codon painting, combination and bar graphs, and have used them to identify distinct codon usage patterns in specific genes and groups of genes. Using our developed Gene-Specific Codon Counting Database, we have identified extreme codon runs in specific genes. We have also demonstrated that specific codon combinations or usage patterns are over-represented in genes whose corresponding proteins belong to ribosome or translation-associated biological processes. Our resulting database provides a mineable list of multi-codon data and can be used to identify unique sequence runs and codon usage patterns in individual and functionally linked groups of genes.

Show MeSH

Related in: MedlinePlus

Heat Map identifies groups of genes over-represented with specific codon–codon doublets. Z-scores, describing whether a gene is over- or under-represented with a codon doublet of identical codons, were hierarchically clustered using CLUSTER software. 5780 gene sequences were filtered to remove any gene sequences that did not register at least one Z-score >2 or <−2, leaving 4561 genes for clustering. The clustered data was visualized using TREEVIEW, with yellow and purple boxes depicting over-represented and under-represented doublets, respectively. The genes are organized vertically based on their similarity to each other across all codon–codon doublets, as defined by the clustering algorithm. Similarly, the codon–codon doublets are organized horizontally based on similarity to each other, as defined by the clustering algorithm. (A): The arrow marks the column containing the gene length Z-score, with yellow and purple boxes representing genes larger or smaller than the genome average, respectively. The average genome size is 1401 base pairs with a standard deviation of 1122 base pairs. We have also denoted cluster I (C1), specific to larger than average genes and B) blown up cluster II (C2) for viewing (B).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC3275765&req=5

bas002-F4: Heat Map identifies groups of genes over-represented with specific codon–codon doublets. Z-scores, describing whether a gene is over- or under-represented with a codon doublet of identical codons, were hierarchically clustered using CLUSTER software. 5780 gene sequences were filtered to remove any gene sequences that did not register at least one Z-score >2 or <−2, leaving 4561 genes for clustering. The clustered data was visualized using TREEVIEW, with yellow and purple boxes depicting over-represented and under-represented doublets, respectively. The genes are organized vertically based on their similarity to each other across all codon–codon doublets, as defined by the clustering algorithm. Similarly, the codon–codon doublets are organized horizontally based on similarity to each other, as defined by the clustering algorithm. (A): The arrow marks the column containing the gene length Z-score, with yellow and purple boxes representing genes larger or smaller than the genome average, respectively. The average genome size is 1401 base pairs with a standard deviation of 1122 base pairs. We have also denoted cluster I (C1), specific to larger than average genes and B) blown up cluster II (C2) for viewing (B).

Mentions: One of the benefits of our database is that the resulting output can be used for genome-based analysis. We performed a global codon analysis with a specific goal to determine if any group(s) of genes were over-represented with same-same codon doublets (i.e. AAA-AAA or AGA-AGA, etc.). We exported all the same–same codon doublet data from 64 excel worksheets and generated Z-scores describing whether a specific doublet was over- or under-represented in a specific gene, relative to the genome average. We also included a quantitative description of whether the gene was smaller or larger than average, as one could expect to find more doublets in a larger sequence. We performed hierarchical cluster analysis (Figure 4) to test this assumption and, surprisingly, determined that in general, larger than average genes are not over-represented with same-same codon doublets (Cluster I). This was not the case for smaller than average genes, as cluster analysis revealed that some groups of smaller than average genes are over-represented with a specific set of same-same codon doublets. It is interesting to note that in Escherichia coli, the Trp operon uses codon doublets in the leader peptides to regulate the levels of tryptophan (24). Our data output suggests that some form of regulation based on same–same codon doublets may be occurring in S. cerevisiae. In general, cluster analysis did not identify genes as being over-represented with multiple codon doublets. One interesting trend that we observed specific to AGA–AGA (Cluster II) was that a large number of ribosomal proteins are over-represented with this doublet specific to arginine, suggesting that this sequence has some regulatory potential in translation associated proteins.Figure 4


The gene-specific codon counting database: a genome-based catalog of one-, two-, three-, four- and five-codon combinations present in Saccharomyces cerevisiae genes.

Tumu S, Patil A, Towns W, Dyavaiah M, Begley TJ - Database (Oxford) (2012)

Heat Map identifies groups of genes over-represented with specific codon–codon doublets. Z-scores, describing whether a gene is over- or under-represented with a codon doublet of identical codons, were hierarchically clustered using CLUSTER software. 5780 gene sequences were filtered to remove any gene sequences that did not register at least one Z-score >2 or <−2, leaving 4561 genes for clustering. The clustered data was visualized using TREEVIEW, with yellow and purple boxes depicting over-represented and under-represented doublets, respectively. The genes are organized vertically based on their similarity to each other across all codon–codon doublets, as defined by the clustering algorithm. Similarly, the codon–codon doublets are organized horizontally based on similarity to each other, as defined by the clustering algorithm. (A): The arrow marks the column containing the gene length Z-score, with yellow and purple boxes representing genes larger or smaller than the genome average, respectively. The average genome size is 1401 base pairs with a standard deviation of 1122 base pairs. We have also denoted cluster I (C1), specific to larger than average genes and B) blown up cluster II (C2) for viewing (B).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC3275765&req=5

bas002-F4: Heat Map identifies groups of genes over-represented with specific codon–codon doublets. Z-scores, describing whether a gene is over- or under-represented with a codon doublet of identical codons, were hierarchically clustered using CLUSTER software. 5780 gene sequences were filtered to remove any gene sequences that did not register at least one Z-score >2 or <−2, leaving 4561 genes for clustering. The clustered data was visualized using TREEVIEW, with yellow and purple boxes depicting over-represented and under-represented doublets, respectively. The genes are organized vertically based on their similarity to each other across all codon–codon doublets, as defined by the clustering algorithm. Similarly, the codon–codon doublets are organized horizontally based on similarity to each other, as defined by the clustering algorithm. (A): The arrow marks the column containing the gene length Z-score, with yellow and purple boxes representing genes larger or smaller than the genome average, respectively. The average genome size is 1401 base pairs with a standard deviation of 1122 base pairs. We have also denoted cluster I (C1), specific to larger than average genes and B) blown up cluster II (C2) for viewing (B).
Mentions: One of the benefits of our database is that the resulting output can be used for genome-based analysis. We performed a global codon analysis with a specific goal to determine if any group(s) of genes were over-represented with same-same codon doublets (i.e. AAA-AAA or AGA-AGA, etc.). We exported all the same–same codon doublet data from 64 excel worksheets and generated Z-scores describing whether a specific doublet was over- or under-represented in a specific gene, relative to the genome average. We also included a quantitative description of whether the gene was smaller or larger than average, as one could expect to find more doublets in a larger sequence. We performed hierarchical cluster analysis (Figure 4) to test this assumption and, surprisingly, determined that in general, larger than average genes are not over-represented with same-same codon doublets (Cluster I). This was not the case for smaller than average genes, as cluster analysis revealed that some groups of smaller than average genes are over-represented with a specific set of same-same codon doublets. It is interesting to note that in Escherichia coli, the Trp operon uses codon doublets in the leader peptides to regulate the levels of tryptophan (24). Our data output suggests that some form of regulation based on same–same codon doublets may be occurring in S. cerevisiae. In general, cluster analysis did not identify genes as being over-represented with multiple codon doublets. One interesting trend that we observed specific to AGA–AGA (Cluster II) was that a large number of ribosomal proteins are over-represented with this doublet specific to arginine, suggesting that this sequence has some regulatory potential in translation associated proteins.Figure 4

Bottom Line: Using our developed Gene-Specific Codon Counting Database, we have identified extreme codon runs in specific genes.We have also demonstrated that specific codon combinations or usage patterns are over-represented in genes whose corresponding proteins belong to ribosome or translation-associated biological processes.Our resulting database provides a mineable list of multi-codon data and can be used to identify unique sequence runs and codon usage patterns in individual and functionally linked groups of genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, University at Albany, State University of New York, Albany, NY 12222, USA.

ABSTRACT
A codon consists of three nucleotides and functions during translation to dictate the insertion of a specific amino acid in a growing peptide or, in the case of stop codons, to specify the completion of protein synthesis. There are 64 possible single codons and there are 4096 double, 262 144 triple, 16 777 216 quadruple and 1 073 741 824 quintuple codon combinations available for use by specific genes and genomes. In order to evaluate the use of specific single, double, triple, quadruple and quintuple codon combinations in genes and gene networks, we have developed a codon counting tool and employed it to analyze 5780 Saccharomyces cerevisiae genes. We have also developed visualization approaches, including codon painting, combination and bar graphs, and have used them to identify distinct codon usage patterns in specific genes and groups of genes. Using our developed Gene-Specific Codon Counting Database, we have identified extreme codon runs in specific genes. We have also demonstrated that specific codon combinations or usage patterns are over-represented in genes whose corresponding proteins belong to ribosome or translation-associated biological processes. Our resulting database provides a mineable list of multi-codon data and can be used to identify unique sequence runs and codon usage patterns in individual and functionally linked groups of genes.

Show MeSH
Related in: MedlinePlus