Limits...
Empirical codon substitution matrix.

Schneider A, Cannarozzi GM, Gonnet GH - BMC Bioinformatics (2005)

Bottom Line: From this data, both a probability matrix and a matrix of similarity scores were computed.They are 64 x 64 matrices describing the substitutions between all codons.Substitutions from sense codons to stop codons are not considered, resulting in block diagonal matrices consisting of 61 x 61 entries for the sense codons and 3 x 3 entries for the stop codons.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Computational Science, Swiss Federal Institute of Technology, Zurich, Switzerland. schneadr@inf.ethz.ch <schneadr@inf.ethz.ch>

ABSTRACT

Background: Codon substitution probabilities are used in many types of molecular evolution studies such as determining Ka/Ks ratios, creating ancestral DNA sequences or aligning coding DNA. Until the recent dramatic increase in genomic data enabled construction of empirical matrices, researchers relied on parameterized models of codon evolution. Here we present the first empirical codon substitution matrix entirely built from alignments of coding sequences from vertebrate DNA and thus provide an alternative to parameterized models of codon evolution.

Results: A set of 17,502 alignments of orthologous sequences from five vertebrate genomes yielded 8.3 million aligned codons from which the number of substitutions between codons were counted. From this data, both a probability matrix and a matrix of similarity scores were computed. They are 64 x 64 matrices describing the substitutions between all codons. Substitutions from sense codons to stop codons are not considered, resulting in block diagonal matrices consisting of 61 x 61 entries for the sense codons and 3 x 3 entries for the stop codons.

Conclusion: The amount of genomic data currently available allowed for the construction of an empirical codon substitution matrix. However, more sequence data is still needed to construct matrices from different subsets of DNA, specific to kingdoms, evolutionary distance or different amount of synonymous change. Codon mutation matrices have advantages for alignments up to medium evolutionary distances and for usages that require DNA such as ancestral reconstruction of DNA sequences and the calculation of Ka/Ks ratios.

Show MeSH
f2 histogram. Histogram of the f2 values from the 17,502 alignments used to construct the matrix.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC1173088&req=5

Figure 1: f2 histogram. Histogram of the f2 values from the 17,502 alignments used to construct the matrix.

Mentions: Increasing amounts of genomic data would allow the construction and comparison of matrices from alignments with differing amounts of synonymous and non-synonymous substitutions, representing a two-dimensional array of matrices, where one dimension is the evolutionary distance and the other corresponds to the amount of synonymous change. Unfortunately, the current size of the nucleotide databases does not yet allow such a clustering of the available data. Instead, the alignments selected to construct the matrices were filtered to fall within a window of synonymous mutations, thereby excluding the most extreme values. (see the Methods section for details). Figure 1 shows the distribution of the alignments' f2 values.


Empirical codon substitution matrix.

Schneider A, Cannarozzi GM, Gonnet GH - BMC Bioinformatics (2005)

f2 histogram. Histogram of the f2 values from the 17,502 alignments used to construct the matrix.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC1173088&req=5

Figure 1: f2 histogram. Histogram of the f2 values from the 17,502 alignments used to construct the matrix.
Mentions: Increasing amounts of genomic data would allow the construction and comparison of matrices from alignments with differing amounts of synonymous and non-synonymous substitutions, representing a two-dimensional array of matrices, where one dimension is the evolutionary distance and the other corresponds to the amount of synonymous change. Unfortunately, the current size of the nucleotide databases does not yet allow such a clustering of the available data. Instead, the alignments selected to construct the matrices were filtered to fall within a window of synonymous mutations, thereby excluding the most extreme values. (see the Methods section for details). Figure 1 shows the distribution of the alignments' f2 values.

Bottom Line: From this data, both a probability matrix and a matrix of similarity scores were computed.They are 64 x 64 matrices describing the substitutions between all codons.Substitutions from sense codons to stop codons are not considered, resulting in block diagonal matrices consisting of 61 x 61 entries for the sense codons and 3 x 3 entries for the stop codons.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Computational Science, Swiss Federal Institute of Technology, Zurich, Switzerland. schneadr@inf.ethz.ch <schneadr@inf.ethz.ch>

ABSTRACT

Background: Codon substitution probabilities are used in many types of molecular evolution studies such as determining Ka/Ks ratios, creating ancestral DNA sequences or aligning coding DNA. Until the recent dramatic increase in genomic data enabled construction of empirical matrices, researchers relied on parameterized models of codon evolution. Here we present the first empirical codon substitution matrix entirely built from alignments of coding sequences from vertebrate DNA and thus provide an alternative to parameterized models of codon evolution.

Results: A set of 17,502 alignments of orthologous sequences from five vertebrate genomes yielded 8.3 million aligned codons from which the number of substitutions between codons were counted. From this data, both a probability matrix and a matrix of similarity scores were computed. They are 64 x 64 matrices describing the substitutions between all codons. Substitutions from sense codons to stop codons are not considered, resulting in block diagonal matrices consisting of 61 x 61 entries for the sense codons and 3 x 3 entries for the stop codons.

Conclusion: The amount of genomic data currently available allowed for the construction of an empirical codon substitution matrix. However, more sequence data is still needed to construct matrices from different subsets of DNA, specific to kingdoms, evolutionary distance or different amount of synonymous change. Codon mutation matrices have advantages for alignments up to medium evolutionary distances and for usages that require DNA such as ancestral reconstruction of DNA sequences and the calculation of Ka/Ks ratios.

Show MeSH