Limits...
A latent variable approach for meta-analysis of gene expression data from multiple microarray experiments.

Choi H, Shen R, Chinnaiyan AM, Ghosh D - BMC Bioinformatics (2007)

Bottom Line: We consider two methods for estimation of an index termed the probability of expression (POE).The first, reported in previous work by the authors, involves Markov Chain Monte Carlo (MCMC) techniques.The second method is a faster algorithm based on the expectation-maximization (EM) algorithm.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics and Huck Institute for Life Sciences, Penn State University, University Park, PA, USA. hwchoi@umich.edu

ABSTRACT

Background: With the explosion in data generated using microarray technology by different investigators working on similar experiments, it is of interest to combine results across multiple studies.

Results: In this article, we describe a general probabilistic framework for combining high-throughput genomic data from several related microarray experiments using mixture models. A key feature of the model is the use of latent variables that represent quantities that can be combined across diverse platforms. We consider two methods for estimation of an index termed the probability of expression (POE). The first, reported in previous work by the authors, involves Markov Chain Monte Carlo (MCMC) techniques. The second method is a faster algorithm based on the expectation-maximization (EM) algorithm. The methods are illustrated with application to a meta-analysis of datasets for metastatic cancer.

Conclusion: The statistical methods described in the paper are available as an R package, metaArray 1.8.1, which is at Bioconductor, whose URL is http://www.bioconductor.org/.

Show MeSH

Related in: MedlinePlus

POE MCMC for all three datasets. Hierarchical clustering of tumors of all three studies using the POE MCMC signature. The expression is on the POE scale.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2246152&req=5

Figure 10: POE MCMC for all three datasets. Hierarchical clustering of tumors of all three studies using the POE MCMC signature. The expression is on the POE scale.

Mentions: To assess the classification performance, we performed hierarchical clustering of tissue samples from the individual studies using the ES signature. Figures 6 through 8 show the heatmaps of the ES signature in individual studies with clustering tree. These were drawn separately because the raw scale data cannot be directly combined as in POE. Figures 9, 10 are the heatmaps of the POE EM and MCMC signatures. To highlight the sample labels in each plot, a yellow/blue color strip was added to the top of the dendrograms through Figures 6, 7, 8, 9, 10, which should be viewed along with the breakdown of the clustering tree. For all plots, we used average linkage clustering with the distance metric defined using the Euclidean metric. This was also done for the Conlon signature [see Additional Files 1, 2, 3]. We found that the clustering performance of this signature was similar to that in the ES signature as well, with most of the errors committed in Garber lung study. The overall classification performance across all signatures is provided in Table 2. Based on the classification table, we see that the proposed methods (EM and MCMC) greatly outperform the Conlon signature, while they also are superior to the ES method, although this difference is smaller.


A latent variable approach for meta-analysis of gene expression data from multiple microarray experiments.

Choi H, Shen R, Chinnaiyan AM, Ghosh D - BMC Bioinformatics (2007)

POE MCMC for all three datasets. Hierarchical clustering of tumors of all three studies using the POE MCMC signature. The expression is on the POE scale.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2246152&req=5

Figure 10: POE MCMC for all three datasets. Hierarchical clustering of tumors of all three studies using the POE MCMC signature. The expression is on the POE scale.
Mentions: To assess the classification performance, we performed hierarchical clustering of tissue samples from the individual studies using the ES signature. Figures 6 through 8 show the heatmaps of the ES signature in individual studies with clustering tree. These were drawn separately because the raw scale data cannot be directly combined as in POE. Figures 9, 10 are the heatmaps of the POE EM and MCMC signatures. To highlight the sample labels in each plot, a yellow/blue color strip was added to the top of the dendrograms through Figures 6, 7, 8, 9, 10, which should be viewed along with the breakdown of the clustering tree. For all plots, we used average linkage clustering with the distance metric defined using the Euclidean metric. This was also done for the Conlon signature [see Additional Files 1, 2, 3]. We found that the clustering performance of this signature was similar to that in the ES signature as well, with most of the errors committed in Garber lung study. The overall classification performance across all signatures is provided in Table 2. Based on the classification table, we see that the proposed methods (EM and MCMC) greatly outperform the Conlon signature, while they also are superior to the ES method, although this difference is smaller.

Bottom Line: We consider two methods for estimation of an index termed the probability of expression (POE).The first, reported in previous work by the authors, involves Markov Chain Monte Carlo (MCMC) techniques.The second method is a faster algorithm based on the expectation-maximization (EM) algorithm.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics and Huck Institute for Life Sciences, Penn State University, University Park, PA, USA. hwchoi@umich.edu

ABSTRACT

Background: With the explosion in data generated using microarray technology by different investigators working on similar experiments, it is of interest to combine results across multiple studies.

Results: In this article, we describe a general probabilistic framework for combining high-throughput genomic data from several related microarray experiments using mixture models. A key feature of the model is the use of latent variables that represent quantities that can be combined across diverse platforms. We consider two methods for estimation of an index termed the probability of expression (POE). The first, reported in previous work by the authors, involves Markov Chain Monte Carlo (MCMC) techniques. The second method is a faster algorithm based on the expectation-maximization (EM) algorithm. The methods are illustrated with application to a meta-analysis of datasets for metastatic cancer.

Conclusion: The statistical methods described in the paper are available as an R package, metaArray 1.8.1, which is at Bioconductor, whose URL is http://www.bioconductor.org/.

Show MeSH
Related in: MedlinePlus