Limits...
Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions.

Houseman EA, Christensen BC, Yeh RF, Marsit CJ, Karagas MR, Wrensch M, Nelson HH, Wiemels J, Zheng S, Wiencke JK, Kelsey KT - BMC Bioinformatics (2008)

Bottom Line: Arrays are now being used to study DNA methylation at a large number of loci; for example, the Illumina GoldenGate platform assesses DNA methylation at 1505 loci associated with over 800 cancer-related genes.We demonstrate our method on the normal tissue samples and show that the clusters are associated with tissue type as well as age.Our proposed recursively-partitioned mixture model is an effective and computationally efficient method for clustering DNA methylation data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, 02115, USA. ahousema@hsph.harvard.edu

ABSTRACT

Background: Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor suppressor gene loci in human cancer. Arrays are now being used to study DNA methylation at a large number of loci; for example, the Illumina GoldenGate platform assesses DNA methylation at 1505 loci associated with over 800 cancer-related genes. Model-based cluster analysis is often used to identify DNA methylation subgroups in data, but it is unclear how to cluster DNA methylation data from arrays in a scalable and reliable manner.

Results: We propose a novel model-based recursive-partitioning algorithm to navigate clusters in a beta mixture model. We present simulations that show that the method is more reliable than competing nonparametric clustering approaches, and is at least as reliable as conventional mixture model methods. We also show that our proposed method is more computationally efficient than conventional mixture model approaches. We demonstrate our method on the normal tissue samples and show that the clusters are associated with tissue type as well as age.

Conclusion: Our proposed recursively-partitioned mixture model is an effective and computationally efficient method for clustering DNA methylation data.

Show MeSH

Related in: MedlinePlus

Unadjusted Average Beta values obtained from Illumina GoldenGate methylation platform for 1413 tumor suppressor loci on 217 normal tissue samples. Yellow = 1.0, black = 0.5, blue = 0.0. Autosomal chromosomes are grouped to aid visualization. For each chromosome group, loci are ordered by their position in a dendrogram produced by hierarchical clustering. Similarly, within tissue sample groups, samples are ordered by their position in a hierarchical clustering dendrogram.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2553421&req=5

Figure 2: Unadjusted Average Beta values obtained from Illumina GoldenGate methylation platform for 1413 tumor suppressor loci on 217 normal tissue samples. Yellow = 1.0, black = 0.5, blue = 0.0. Autosomal chromosomes are grouped to aid visualization. For each chromosome group, loci are ordered by their position in a dendrogram produced by hierarchical clustering. Similarly, within tissue sample groups, samples are ordered by their position in a hierarchical clustering dendrogram.

Mentions: Our proposed method is motivated by methylation array data obtained for normal tissue. We extracted DNA from 217 normal tissue samples, modified with sodium bisulfite, and processed them on the Illumina GoldenGate methylation platform. Tissues were assembled by a collaborative, multi-institutional network of principal investigators conducting molecular epidemiologic studies of human cancer. Participating institutions include the International Mesothelioma Program at Brigham and Women's Hospital, Brown University, Dartmouth-Hitchcock Medical Center, University of California – San Francisco, Brain Tumor SPORE program, University of Massachusetts – Lowell, and the University of Minnesota. Tissues were obtained through Institutional Review Board approved studies already underway at these institutions, or purchased from the National Disease Research Interchange (NDRI). A variety of normal tissue types were assembled: bladder (n = 5), blood (n = 85), brain (n = 12), cervix (n = 3), head and neck (n = 11), kidney (n = 6), lung (n = 53), placenta (n = 19), pleura (n = 18), and small intestine (n = 5). All tissue samples were from adults except n = 55 samples of Guthrie card derived blood samples from newborns. Figure 2 illustrates the methylation pattern for all 217 subjects and 1413 loci passing quality-assurance procedures (median detection P-value < 0.05).


Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions.

Houseman EA, Christensen BC, Yeh RF, Marsit CJ, Karagas MR, Wrensch M, Nelson HH, Wiemels J, Zheng S, Wiencke JK, Kelsey KT - BMC Bioinformatics (2008)

Unadjusted Average Beta values obtained from Illumina GoldenGate methylation platform for 1413 tumor suppressor loci on 217 normal tissue samples. Yellow = 1.0, black = 0.5, blue = 0.0. Autosomal chromosomes are grouped to aid visualization. For each chromosome group, loci are ordered by their position in a dendrogram produced by hierarchical clustering. Similarly, within tissue sample groups, samples are ordered by their position in a hierarchical clustering dendrogram.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2553421&req=5

Figure 2: Unadjusted Average Beta values obtained from Illumina GoldenGate methylation platform for 1413 tumor suppressor loci on 217 normal tissue samples. Yellow = 1.0, black = 0.5, blue = 0.0. Autosomal chromosomes are grouped to aid visualization. For each chromosome group, loci are ordered by their position in a dendrogram produced by hierarchical clustering. Similarly, within tissue sample groups, samples are ordered by their position in a hierarchical clustering dendrogram.
Mentions: Our proposed method is motivated by methylation array data obtained for normal tissue. We extracted DNA from 217 normal tissue samples, modified with sodium bisulfite, and processed them on the Illumina GoldenGate methylation platform. Tissues were assembled by a collaborative, multi-institutional network of principal investigators conducting molecular epidemiologic studies of human cancer. Participating institutions include the International Mesothelioma Program at Brigham and Women's Hospital, Brown University, Dartmouth-Hitchcock Medical Center, University of California – San Francisco, Brain Tumor SPORE program, University of Massachusetts – Lowell, and the University of Minnesota. Tissues were obtained through Institutional Review Board approved studies already underway at these institutions, or purchased from the National Disease Research Interchange (NDRI). A variety of normal tissue types were assembled: bladder (n = 5), blood (n = 85), brain (n = 12), cervix (n = 3), head and neck (n = 11), kidney (n = 6), lung (n = 53), placenta (n = 19), pleura (n = 18), and small intestine (n = 5). All tissue samples were from adults except n = 55 samples of Guthrie card derived blood samples from newborns. Figure 2 illustrates the methylation pattern for all 217 subjects and 1413 loci passing quality-assurance procedures (median detection P-value < 0.05).

Bottom Line: Arrays are now being used to study DNA methylation at a large number of loci; for example, the Illumina GoldenGate platform assesses DNA methylation at 1505 loci associated with over 800 cancer-related genes.We demonstrate our method on the normal tissue samples and show that the clusters are associated with tissue type as well as age.Our proposed recursively-partitioned mixture model is an effective and computationally efficient method for clustering DNA methylation data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, 02115, USA. ahousema@hsph.harvard.edu

ABSTRACT

Background: Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor suppressor gene loci in human cancer. Arrays are now being used to study DNA methylation at a large number of loci; for example, the Illumina GoldenGate platform assesses DNA methylation at 1505 loci associated with over 800 cancer-related genes. Model-based cluster analysis is often used to identify DNA methylation subgroups in data, but it is unclear how to cluster DNA methylation data from arrays in a scalable and reliable manner.

Results: We propose a novel model-based recursive-partitioning algorithm to navigate clusters in a beta mixture model. We present simulations that show that the method is more reliable than competing nonparametric clustering approaches, and is at least as reliable as conventional mixture model methods. We also show that our proposed method is more computationally efficient than conventional mixture model approaches. We demonstrate our method on the normal tissue samples and show that the clusters are associated with tissue type as well as age.

Conclusion: Our proposed recursively-partitioned mixture model is an effective and computationally efficient method for clustering DNA methylation data.

Show MeSH
Related in: MedlinePlus