Limits...
COMIT: identification of noncoding motifs under selection in coding sequences.

Kural D, Ding Y, Wu J, Korpi AM, Chuang JH - Genome Biol. (2009)

Bottom Line: Coding nucleotide sequences contain myriad functions independent of their encoded protein sequences.COMIT concurs with diverse experimental datasets, including splicing enhancers, silencers, replication motifs, and microRNA targets, and predicts many novel functional motifs.Intriguingly, COMIT scores are well-correlated to scores uncalibrated for amino acids, suggesting that nucleotide motifs often override peptide-level constraints.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biology, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA. kural@bc.edu

ABSTRACT
Coding nucleotide sequences contain myriad functions independent of their encoded protein sequences. We present the COMIT algorithm to detect functional noncoding motifs in coding regions using sequence conservation, explicitly separating nucleotide from amino acid effects. COMIT concurs with diverse experimental datasets, including splicing enhancers, silencers, replication motifs, and microRNA targets, and predicts many novel functional motifs. Intriguingly, COMIT scores are well-correlated to scores uncalibrated for amino acids, suggesting that nucleotide motifs often override peptide-level constraints.

Show MeSH
Schematic of the COMIT algorithm for identifying unusually conserved motifs in coding regions. The example illustrates how the score would be calculated for the motif ACAAAG, using genome-wide coding sequence alignments for two species. Each instance of the motif is identified in species 1, and the observed conservation - that is, whether all bases are identical among the two species - is calculated. The expected conservation at each instance is modeled from genome-wide frequencies of nucleotide-level conservation patterns conditional on the aligned amino acids. For each instance, the expected conservation is calculated from all possible ways in which the motif could be conserved at that location given the amino acids in each species, using values from Table 1 (typically some of these quantities, such as (H, Y)111, will be zero). The observed and expected conservation levels are compared and normalized to yield a conservation z-score for each motif.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3091326&req=5

Figure 1: Schematic of the COMIT algorithm for identifying unusually conserved motifs in coding regions. The example illustrates how the score would be calculated for the motif ACAAAG, using genome-wide coding sequence alignments for two species. Each instance of the motif is identified in species 1, and the observed conservation - that is, whether all bases are identical among the two species - is calculated. The expected conservation at each instance is modeled from genome-wide frequencies of nucleotide-level conservation patterns conditional on the aligned amino acids. For each instance, the expected conservation is calculated from all possible ways in which the motif could be conserved at that location given the amino acids in each species, using values from Table 1 (typically some of these quantities, such as (H, Y)111, will be zero). The observed and expected conservation levels are compared and normalized to yield a conservation z-score for each motif.

Mentions: Using alignments of all mouse and human coding sequences, we calculated a COMIT z-score for the sequence conservation of all 4,096 6-mers. For each motif, we considered every instance in which it occurred in the coding regions of human, measured the number of conserved instances, and compared this to the number of conserved instances that would be expected given only the amino acid alignments. A schematic of this procedure is shown in Figure 1, with a full description provided in the Materials and methods. Out of these 4,096 motifs, we found 503 with a z-score > 15, suggesting that many motifs are subject to noncoding pressures. In contrast, one would expect < 10-46 motifs to have z > 15 in a normal distribution. We performed a similar evaluation of motifs in the Saccharomyces cerevisiae- Saccharomyces bayanus comparison. For these yeasts we found 115 motifs with z > 10, compared to < 10-19 expected, suggesting that yeast species contain many motifs under noncoding pressures in coding regions as well. Prokaryotes also exhibited an excess of motifs with strong conservation. When we applied COMIT to aligned Escherichia coli and Yersinia pestis coding regions, we found 17 hexamers with z > 20 and none with z <-10.


COMIT: identification of noncoding motifs under selection in coding sequences.

Kural D, Ding Y, Wu J, Korpi AM, Chuang JH - Genome Biol. (2009)

Schematic of the COMIT algorithm for identifying unusually conserved motifs in coding regions. The example illustrates how the score would be calculated for the motif ACAAAG, using genome-wide coding sequence alignments for two species. Each instance of the motif is identified in species 1, and the observed conservation - that is, whether all bases are identical among the two species - is calculated. The expected conservation at each instance is modeled from genome-wide frequencies of nucleotide-level conservation patterns conditional on the aligned amino acids. For each instance, the expected conservation is calculated from all possible ways in which the motif could be conserved at that location given the amino acids in each species, using values from Table 1 (typically some of these quantities, such as (H, Y)111, will be zero). The observed and expected conservation levels are compared and normalized to yield a conservation z-score for each motif.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3091326&req=5

Figure 1: Schematic of the COMIT algorithm for identifying unusually conserved motifs in coding regions. The example illustrates how the score would be calculated for the motif ACAAAG, using genome-wide coding sequence alignments for two species. Each instance of the motif is identified in species 1, and the observed conservation - that is, whether all bases are identical among the two species - is calculated. The expected conservation at each instance is modeled from genome-wide frequencies of nucleotide-level conservation patterns conditional on the aligned amino acids. For each instance, the expected conservation is calculated from all possible ways in which the motif could be conserved at that location given the amino acids in each species, using values from Table 1 (typically some of these quantities, such as (H, Y)111, will be zero). The observed and expected conservation levels are compared and normalized to yield a conservation z-score for each motif.
Mentions: Using alignments of all mouse and human coding sequences, we calculated a COMIT z-score for the sequence conservation of all 4,096 6-mers. For each motif, we considered every instance in which it occurred in the coding regions of human, measured the number of conserved instances, and compared this to the number of conserved instances that would be expected given only the amino acid alignments. A schematic of this procedure is shown in Figure 1, with a full description provided in the Materials and methods. Out of these 4,096 motifs, we found 503 with a z-score > 15, suggesting that many motifs are subject to noncoding pressures. In contrast, one would expect < 10-46 motifs to have z > 15 in a normal distribution. We performed a similar evaluation of motifs in the Saccharomyces cerevisiae- Saccharomyces bayanus comparison. For these yeasts we found 115 motifs with z > 10, compared to < 10-19 expected, suggesting that yeast species contain many motifs under noncoding pressures in coding regions as well. Prokaryotes also exhibited an excess of motifs with strong conservation. When we applied COMIT to aligned Escherichia coli and Yersinia pestis coding regions, we found 17 hexamers with z > 20 and none with z <-10.

Bottom Line: Coding nucleotide sequences contain myriad functions independent of their encoded protein sequences.COMIT concurs with diverse experimental datasets, including splicing enhancers, silencers, replication motifs, and microRNA targets, and predicts many novel functional motifs.Intriguingly, COMIT scores are well-correlated to scores uncalibrated for amino acids, suggesting that nucleotide motifs often override peptide-level constraints.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biology, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA. kural@bc.edu

ABSTRACT
Coding nucleotide sequences contain myriad functions independent of their encoded protein sequences. We present the COMIT algorithm to detect functional noncoding motifs in coding regions using sequence conservation, explicitly separating nucleotide from amino acid effects. COMIT concurs with diverse experimental datasets, including splicing enhancers, silencers, replication motifs, and microRNA targets, and predicts many novel functional motifs. Intriguingly, COMIT scores are well-correlated to scores uncalibrated for amino acids, suggesting that nucleotide motifs often override peptide-level constraints.

Show MeSH