Limits...
CG dinucleotide clustering is a species-specific property of the genome.

Glass JL, Thompson RF, Khulan B, Figueroa ME, Olivier EN, Oakley EJ, Van Zant G, Bouhassira EE, Melnick A, Golden A, Fazzari MJ, Greally JM - Nucleic Acids Res. (2007)

Bottom Line: We also show that the CG clusters co-localize in the human genome with hypomethylated loci and annotated transcription start sites to a greater extent than annotations produced by prior CpG island definitions.Moreover, this new approach allows CG clusters to be identified in a species-specific manner, revealing a degree of orthologous conservation that is not revealed by current base compositional approaches.Finally, our approach is able to identify methylating genomes (such as Takifugu rubripes) that lack CG clustering entirely, in which it is inappropriate to annotate CpG islands or CG clusters.

View Article: PubMed Central - PubMed

Affiliation: Department of Molecular Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA, Division of Hematology/Oncology, University of Kentucky, Markey Cancer Center, 800 Rose Street, Lexington KY 40536, USA.

ABSTRACT
Cytosines at cytosine-guanine (CG) dinucleotides are the near-exclusive target of DNA methyltransferases in mammalian genomes. Spontaneous deamination of methylcytosine to thymine makes methylated cytosines unusually susceptible to mutation and consequent depletion. The loci where CG dinucleotides remain relatively enriched, presumably due to their unmethylated status during the germ cell cycle, have been referred to as CpG islands. Currently, CpG islands are solely defined by base compositional criteria, allowing annotation of any sequenced genome. Using a novel bioinformatic approach, we show that CG clusters can be identified as an inherent property of genomic sequence without imposing a base compositional a priori assumption. We also show that the CG clusters co-localize in the human genome with hypomethylated loci and annotated transcription start sites to a greater extent than annotations produced by prior CpG island definitions. Moreover, this new approach allows CG clusters to be identified in a species-specific manner, revealing a degree of orthologous conservation that is not revealed by current base compositional approaches. Finally, our approach is able to identify methylating genomes (such as Takifugu rubripes) that lack CG clustering entirely, in which it is inappropriate to annotate CpG islands or CG clusters.

Show MeSH

Related in: MedlinePlus

The mouse genome has different CG clustering characteristics than those of the human genome. The optimization curve characteristics for mouse are clearly different from those for human (a). The optimal mouse annotation contains fragments no longer than 585 nt with 24 or more CGs per fragment, fewer CGs in a longer stretch of DNA than for the human genome. In panel (b) it is again apparent that base composition criteria alone will fail to recognize a substantial proportion of CG-dense loci in this species.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2175314&req=5

Figure 6: The mouse genome has different CG clustering characteristics than those of the human genome. The optimization curve characteristics for mouse are clearly different from those for human (a). The optimal mouse annotation contains fragments no longer than 585 nt with 24 or more CGs per fragment, fewer CGs in a longer stretch of DNA than for the human genome. In panel (b) it is again apparent that base composition criteria alone will fail to recognize a substantial proportion of CG-dense loci in this species.

Mentions: When we performed the CG clustering analysis of the mouse genome, we found it also generates two populations with distinct CG density characteristics, but that the optimal CG cluster definition for the mouse genome is different from that of the human, corresponding to 24 or more CG dinucleotides in a sequence of no more than 585 bp in length (Figure 6). By comparison, human CG clusters consist of 27 CGs in no more than 571 bp. When we calculated the total number of CG clusters for the mouse genome, it was strikingly similar to that for the human (42 971 and 44 165, respectively, Table 1). In addition, when we re-analyzed a sample of 23 loci originally published to demonstrate the failure of CpG island conservation between these species (15), we found that while only 18 conserve CpG islands, 22 out of 23 conserve CG clusters, the single exception in this limited sample being the alpha globin orthologs (HBA1/Hba-a1). We extended this study to test conservation of each annotation genome-wide. Of all of the 27 801 CpG islands annotated at the UCSC Genome Browser, 14 452 have orthologous sequences with CpG islands in the mouse genome, while there exist 19 410 sites of conserved CG clustering (Table 1). When studied using our genome-specific annotations, clustered CG dinucleotides are demonstrably much more conserved between species than previously appreciated.Figure 6.


CG dinucleotide clustering is a species-specific property of the genome.

Glass JL, Thompson RF, Khulan B, Figueroa ME, Olivier EN, Oakley EJ, Van Zant G, Bouhassira EE, Melnick A, Golden A, Fazzari MJ, Greally JM - Nucleic Acids Res. (2007)

The mouse genome has different CG clustering characteristics than those of the human genome. The optimization curve characteristics for mouse are clearly different from those for human (a). The optimal mouse annotation contains fragments no longer than 585 nt with 24 or more CGs per fragment, fewer CGs in a longer stretch of DNA than for the human genome. In panel (b) it is again apparent that base composition criteria alone will fail to recognize a substantial proportion of CG-dense loci in this species.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2175314&req=5

Figure 6: The mouse genome has different CG clustering characteristics than those of the human genome. The optimization curve characteristics for mouse are clearly different from those for human (a). The optimal mouse annotation contains fragments no longer than 585 nt with 24 or more CGs per fragment, fewer CGs in a longer stretch of DNA than for the human genome. In panel (b) it is again apparent that base composition criteria alone will fail to recognize a substantial proportion of CG-dense loci in this species.
Mentions: When we performed the CG clustering analysis of the mouse genome, we found it also generates two populations with distinct CG density characteristics, but that the optimal CG cluster definition for the mouse genome is different from that of the human, corresponding to 24 or more CG dinucleotides in a sequence of no more than 585 bp in length (Figure 6). By comparison, human CG clusters consist of 27 CGs in no more than 571 bp. When we calculated the total number of CG clusters for the mouse genome, it was strikingly similar to that for the human (42 971 and 44 165, respectively, Table 1). In addition, when we re-analyzed a sample of 23 loci originally published to demonstrate the failure of CpG island conservation between these species (15), we found that while only 18 conserve CpG islands, 22 out of 23 conserve CG clusters, the single exception in this limited sample being the alpha globin orthologs (HBA1/Hba-a1). We extended this study to test conservation of each annotation genome-wide. Of all of the 27 801 CpG islands annotated at the UCSC Genome Browser, 14 452 have orthologous sequences with CpG islands in the mouse genome, while there exist 19 410 sites of conserved CG clustering (Table 1). When studied using our genome-specific annotations, clustered CG dinucleotides are demonstrably much more conserved between species than previously appreciated.Figure 6.

Bottom Line: We also show that the CG clusters co-localize in the human genome with hypomethylated loci and annotated transcription start sites to a greater extent than annotations produced by prior CpG island definitions.Moreover, this new approach allows CG clusters to be identified in a species-specific manner, revealing a degree of orthologous conservation that is not revealed by current base compositional approaches.Finally, our approach is able to identify methylating genomes (such as Takifugu rubripes) that lack CG clustering entirely, in which it is inappropriate to annotate CpG islands or CG clusters.

View Article: PubMed Central - PubMed

Affiliation: Department of Molecular Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA, Division of Hematology/Oncology, University of Kentucky, Markey Cancer Center, 800 Rose Street, Lexington KY 40536, USA.

ABSTRACT
Cytosines at cytosine-guanine (CG) dinucleotides are the near-exclusive target of DNA methyltransferases in mammalian genomes. Spontaneous deamination of methylcytosine to thymine makes methylated cytosines unusually susceptible to mutation and consequent depletion. The loci where CG dinucleotides remain relatively enriched, presumably due to their unmethylated status during the germ cell cycle, have been referred to as CpG islands. Currently, CpG islands are solely defined by base compositional criteria, allowing annotation of any sequenced genome. Using a novel bioinformatic approach, we show that CG clusters can be identified as an inherent property of genomic sequence without imposing a base compositional a priori assumption. We also show that the CG clusters co-localize in the human genome with hypomethylated loci and annotated transcription start sites to a greater extent than annotations produced by prior CpG island definitions. Moreover, this new approach allows CG clusters to be identified in a species-specific manner, revealing a degree of orthologous conservation that is not revealed by current base compositional approaches. Finally, our approach is able to identify methylating genomes (such as Takifugu rubripes) that lack CG clustering entirely, in which it is inappropriate to annotate CpG islands or CG clusters.

Show MeSH
Related in: MedlinePlus