Limits...
Separate base usages of genes located on the leading and lagging strands in Chlamydia muridarum revealed by the Z curve method.

Guo FB, Yu XJ - BMC Genomics (2007)

Bottom Line: It was found that genes located on the two strands of replication have distinct base usages in C. muridarum genome.Furthermore, for the genomes of Borrelia burgdorferi, Treponema pallidum, Chlamydia muridarum and Chlamydia trachomatis, in which distinct base and/or codon usages have been observed, closer phylogenetic distance is found compared with other bacterial genomes.However, the base composition asymmetry between the leading and lagging strands in C. muridarum is more significant than that in other bacteria.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China. fbguo@uestc.edu.cn

ABSTRACT

Background: The nucleotide compositional asymmetry between the leading and lagging strands in bacterial genomes has been the subject of intensive study in the past few years. It is interesting to mention that almost all bacterial genomes exhibit the same kind of base asymmetry. This work aims to investigate the strand biases in Chlamydia muridarum genome and show the potential of the Z curve method for quantitatively differentiating genes on the leading and lagging strands.

Results: The occurrence frequencies of bases of protein-coding genes in C. muridarum genome were analyzed by the Z curve method. It was found that genes located on the two strands of replication have distinct base usages in C. muridarum genome. According to their positions in the 9-D space spanned by the variables u1 - u9 of the Z curve method, K-means clustering algorithm can assign about 94% of genes to the correct strands, which is a few percent higher than those correctly classified by K-means based on the RSCU. The base usage and codon usage analyses show that genes on the leading strand have more G than C and more T than A, particularly at the third codon position. For genes on the lagging strand the biases is reverse. The y component of the Z curves for the complete chromosome sequences show that the excess of G over C and T over A are more remarkable in C. muridarum genome than in other bacterial genomes without separating base and/or codon usages. Furthermore, for the genomes of Borrelia burgdorferi, Treponema pallidum, Chlamydia muridarum and Chlamydia trachomatis, in which distinct base and/or codon usages have been observed, closer phylogenetic distance is found compared with other bacterial genomes.

Conclusion: The nature of the strand biases of base composition in C. muridarum is similar to that in most other bacterial genomes. However, the base composition asymmetry between the leading and lagging strands in C. muridarum is more significant than that in other bacteria. It's supposed that the remarkable strand biases of G/C and T/A are responsible for the appearance of separate base or codon usages in C. muridarum. On the other hand, the closer phylogenetic distance among the four bacterial genomes with separate base and/or codon usages is necessary rather than occasional. It's also shown that the Z curve method may be more sensitive than RSCU when being used to quantitatively analyze DNA sequences.

Show MeSH

Related in: MedlinePlus

The distribution of points based on the two most important axes using the correspondence analysis of the nine variables u1 – u9 for 909 genes of the C. muridarum genome. The genes transcribed on the leading strand are denoted by crosses, whereas the genes located on the lagging strand are denoted by open circles. The partition between the two categories of points in the plot shows that the genes located on the two strands of replication have separate base usages.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2089121&req=5

Figure 1: The distribution of points based on the two most important axes using the correspondence analysis of the nine variables u1 – u9 for 909 genes of the C. muridarum genome. The genes transcribed on the leading strand are denoted by crosses, whereas the genes located on the lagging strand are denoted by open circles. The partition between the two categories of points in the plot shows that the genes located on the two strands of replication have separate base usages.

Mentions: For each of the 909 genes in C. muridarum, the nine variables u1 – u9 were calculated, which correspond to a point in a 9-D space. In order to visualize the distribution of mapping points in the 9-D space, project them onto a 2-D plane spanned by the first and second principal axes using the CA method. The first and second principal axes account for 28.0% and 23.8% of the total inertia of the 9-D space, respectively. And no other axes account for more than 12%. Figure 1 shows the position of the genes on the 2-D principal plane. As can be seen, all the genes are separated into two distinct clusters with little overlap, which indicate the genes in the two clusters have different base usage.


Separate base usages of genes located on the leading and lagging strands in Chlamydia muridarum revealed by the Z curve method.

Guo FB, Yu XJ - BMC Genomics (2007)

The distribution of points based on the two most important axes using the correspondence analysis of the nine variables u1 – u9 for 909 genes of the C. muridarum genome. The genes transcribed on the leading strand are denoted by crosses, whereas the genes located on the lagging strand are denoted by open circles. The partition between the two categories of points in the plot shows that the genes located on the two strands of replication have separate base usages.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2089121&req=5

Figure 1: The distribution of points based on the two most important axes using the correspondence analysis of the nine variables u1 – u9 for 909 genes of the C. muridarum genome. The genes transcribed on the leading strand are denoted by crosses, whereas the genes located on the lagging strand are denoted by open circles. The partition between the two categories of points in the plot shows that the genes located on the two strands of replication have separate base usages.
Mentions: For each of the 909 genes in C. muridarum, the nine variables u1 – u9 were calculated, which correspond to a point in a 9-D space. In order to visualize the distribution of mapping points in the 9-D space, project them onto a 2-D plane spanned by the first and second principal axes using the CA method. The first and second principal axes account for 28.0% and 23.8% of the total inertia of the 9-D space, respectively. And no other axes account for more than 12%. Figure 1 shows the position of the genes on the 2-D principal plane. As can be seen, all the genes are separated into two distinct clusters with little overlap, which indicate the genes in the two clusters have different base usage.

Bottom Line: It was found that genes located on the two strands of replication have distinct base usages in C. muridarum genome.Furthermore, for the genomes of Borrelia burgdorferi, Treponema pallidum, Chlamydia muridarum and Chlamydia trachomatis, in which distinct base and/or codon usages have been observed, closer phylogenetic distance is found compared with other bacterial genomes.However, the base composition asymmetry between the leading and lagging strands in C. muridarum is more significant than that in other bacteria.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China. fbguo@uestc.edu.cn

ABSTRACT

Background: The nucleotide compositional asymmetry between the leading and lagging strands in bacterial genomes has been the subject of intensive study in the past few years. It is interesting to mention that almost all bacterial genomes exhibit the same kind of base asymmetry. This work aims to investigate the strand biases in Chlamydia muridarum genome and show the potential of the Z curve method for quantitatively differentiating genes on the leading and lagging strands.

Results: The occurrence frequencies of bases of protein-coding genes in C. muridarum genome were analyzed by the Z curve method. It was found that genes located on the two strands of replication have distinct base usages in C. muridarum genome. According to their positions in the 9-D space spanned by the variables u1 - u9 of the Z curve method, K-means clustering algorithm can assign about 94% of genes to the correct strands, which is a few percent higher than those correctly classified by K-means based on the RSCU. The base usage and codon usage analyses show that genes on the leading strand have more G than C and more T than A, particularly at the third codon position. For genes on the lagging strand the biases is reverse. The y component of the Z curves for the complete chromosome sequences show that the excess of G over C and T over A are more remarkable in C. muridarum genome than in other bacterial genomes without separating base and/or codon usages. Furthermore, for the genomes of Borrelia burgdorferi, Treponema pallidum, Chlamydia muridarum and Chlamydia trachomatis, in which distinct base and/or codon usages have been observed, closer phylogenetic distance is found compared with other bacterial genomes.

Conclusion: The nature of the strand biases of base composition in C. muridarum is similar to that in most other bacterial genomes. However, the base composition asymmetry between the leading and lagging strands in C. muridarum is more significant than that in other bacteria. It's supposed that the remarkable strand biases of G/C and T/A are responsible for the appearance of separate base or codon usages in C. muridarum. On the other hand, the closer phylogenetic distance among the four bacterial genomes with separate base and/or codon usages is necessary rather than occasional. It's also shown that the Z curve method may be more sensitive than RSCU when being used to quantitatively analyze DNA sequences.

Show MeSH
Related in: MedlinePlus