Limits...
Bayesian meta-analysis models for microarray data: a comparative study.

Conlon EM, Song JJ, Liu A - BMC Bioinformatics (2007)

Bottom Line: We identified similar results when pooling two independent studies of Bacillus subtilis.The Bayesian meta-analysis model that combines probabilities across studies does not aggregate gene expression measures, thus an inter-study variability parameter is not included in the model.This results in a simpler modeling approach than aggregating expression measures, which accounts for variability across studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Mathematics and Statistics, University of Massachusetts, Amherst, Massachusetts, USA. econlon@mathstat.umass.edu

ABSTRACT

Background: With the growing abundance of microarray data, statistical methods are increasingly needed to integrate results across studies. Two common approaches for meta-analysis of microarrays include either combining gene expression measures across studies or combining summaries such as p-values, probabilities or ranks. Here, we compare two Bayesian meta-analysis models that are analogous to these methods.

Results: Two Bayesian meta-analysis models for microarray data have recently been introduced. The first model combines standardized gene expression measures across studies into an overall mean, accounting for inter-study variability, while the second combines probabilities of differential expression without combining expression values. Both models produce the gene-specific posterior probability of differential expression, which is the basis for inference. Since the standardized expression integration model includes inter-study variability, it may improve accuracy of results versus the probability integration model. However, due to the small number of studies typical in microarray meta-analyses, the variability between studies is challenging to estimate. The probability integration model eliminates the need to model variability between studies, and thus its implementation is more straightforward. We found in simulations of two and five studies that combining probabilities outperformed combining standardized gene expression measures for three comparison values: the percent of true discovered genes in meta-analysis versus individual studies; the percent of true genes omitted in meta-analysis versus separate studies, and the number of true discovered genes for fixed levels of Bayesian false discovery. We identified similar results when pooling two independent studies of Bacillus subtilis. We assumed that each study was produced from the same microarray platform with only two conditions: a treatment and control, and that the data sets were pre-scaled.

Conclusion: The Bayesian meta-analysis model that combines probabilities across studies does not aggregate gene expression measures, thus an inter-study variability parameter is not included in the model. This results in a simpler modeling approach than aggregating expression measures, which accounts for variability across studies. The probability integration model identified more true discovered genes and fewer true omitted genes than combining expression measures, for our data sets.

Show MeSH

Related in: MedlinePlus

tIDR versus posterior probability of differential expression for the two-study simulation data. True integration-driven discovery rate (tIDR) versus threshold values of posterior probability of differential expression γ ≥ 0.50, for the standardized expression integration model (blue circles) and probability integration model (black diamonds) for the two-study simulation data with high mean  = 0.7 (differentially expressed); 0.07 (non-differentially expressed) and the following simulated percent differentially expressed genes ps: a) 5%; b) 10%; c) 25%.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1851021&req=5

Figure 1: tIDR versus posterior probability of differential expression for the two-study simulation data. True integration-driven discovery rate (tIDR) versus threshold values of posterior probability of differential expression γ ≥ 0.50, for the standardized expression integration model (blue circles) and probability integration model (black diamonds) for the two-study simulation data with high mean = 0.7 (differentially expressed); 0.07 (non-differentially expressed) and the following simulated percent differentially expressed genes ps: a) 5%; b) 10%; c) 25%.

Mentions: We implemented the Bayesian standardized expression integration model (SEI hereafter, Model (1)) and the Bayesian probability integration model (PI hereafter, Model (2)) to combine the two simulated studies for the three levels of percent differentially expressed genes and three levels of inter-study variation. We also analyzed each study individually. Note again that in individual studies, the SEI and PI models are equivalent, i.e. the only differences between the SEI and PI models are at the inter-study level. In order to compare the SEI and PI models, we calculated for each model the true integration-driven discovery rate (tIDR) and the true integration-driven revision rate (tIRR) for thresholds of γ ≥ 0.50, i.e. the posterior probability of differential expression greater or equal to 50%. The PI model produced higher tIDR and lower tIRR than the SEI model for all values of γ ≥ 0.50 for the simulated data. Figures 1 and 2 display the tIDR and tIRR results, respectively, for the three simulated levels of ps and high mean . Table 1 presents results for all simulated data sets, for representative threshold value γ = 0.95.


Bayesian meta-analysis models for microarray data: a comparative study.

Conlon EM, Song JJ, Liu A - BMC Bioinformatics (2007)

tIDR versus posterior probability of differential expression for the two-study simulation data. True integration-driven discovery rate (tIDR) versus threshold values of posterior probability of differential expression γ ≥ 0.50, for the standardized expression integration model (blue circles) and probability integration model (black diamonds) for the two-study simulation data with high mean  = 0.7 (differentially expressed); 0.07 (non-differentially expressed) and the following simulated percent differentially expressed genes ps: a) 5%; b) 10%; c) 25%.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1851021&req=5

Figure 1: tIDR versus posterior probability of differential expression for the two-study simulation data. True integration-driven discovery rate (tIDR) versus threshold values of posterior probability of differential expression γ ≥ 0.50, for the standardized expression integration model (blue circles) and probability integration model (black diamonds) for the two-study simulation data with high mean = 0.7 (differentially expressed); 0.07 (non-differentially expressed) and the following simulated percent differentially expressed genes ps: a) 5%; b) 10%; c) 25%.
Mentions: We implemented the Bayesian standardized expression integration model (SEI hereafter, Model (1)) and the Bayesian probability integration model (PI hereafter, Model (2)) to combine the two simulated studies for the three levels of percent differentially expressed genes and three levels of inter-study variation. We also analyzed each study individually. Note again that in individual studies, the SEI and PI models are equivalent, i.e. the only differences between the SEI and PI models are at the inter-study level. In order to compare the SEI and PI models, we calculated for each model the true integration-driven discovery rate (tIDR) and the true integration-driven revision rate (tIRR) for thresholds of γ ≥ 0.50, i.e. the posterior probability of differential expression greater or equal to 50%. The PI model produced higher tIDR and lower tIRR than the SEI model for all values of γ ≥ 0.50 for the simulated data. Figures 1 and 2 display the tIDR and tIRR results, respectively, for the three simulated levels of ps and high mean . Table 1 presents results for all simulated data sets, for representative threshold value γ = 0.95.

Bottom Line: We identified similar results when pooling two independent studies of Bacillus subtilis.The Bayesian meta-analysis model that combines probabilities across studies does not aggregate gene expression measures, thus an inter-study variability parameter is not included in the model.This results in a simpler modeling approach than aggregating expression measures, which accounts for variability across studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Mathematics and Statistics, University of Massachusetts, Amherst, Massachusetts, USA. econlon@mathstat.umass.edu

ABSTRACT

Background: With the growing abundance of microarray data, statistical methods are increasingly needed to integrate results across studies. Two common approaches for meta-analysis of microarrays include either combining gene expression measures across studies or combining summaries such as p-values, probabilities or ranks. Here, we compare two Bayesian meta-analysis models that are analogous to these methods.

Results: Two Bayesian meta-analysis models for microarray data have recently been introduced. The first model combines standardized gene expression measures across studies into an overall mean, accounting for inter-study variability, while the second combines probabilities of differential expression without combining expression values. Both models produce the gene-specific posterior probability of differential expression, which is the basis for inference. Since the standardized expression integration model includes inter-study variability, it may improve accuracy of results versus the probability integration model. However, due to the small number of studies typical in microarray meta-analyses, the variability between studies is challenging to estimate. The probability integration model eliminates the need to model variability between studies, and thus its implementation is more straightforward. We found in simulations of two and five studies that combining probabilities outperformed combining standardized gene expression measures for three comparison values: the percent of true discovered genes in meta-analysis versus individual studies; the percent of true genes omitted in meta-analysis versus separate studies, and the number of true discovered genes for fixed levels of Bayesian false discovery. We identified similar results when pooling two independent studies of Bacillus subtilis. We assumed that each study was produced from the same microarray platform with only two conditions: a treatment and control, and that the data sets were pre-scaled.

Conclusion: The Bayesian meta-analysis model that combines probabilities across studies does not aggregate gene expression measures, thus an inter-study variability parameter is not included in the model. This results in a simpler modeling approach than aggregating expression measures, which accounts for variability across studies. The probability integration model identified more true discovered genes and fewer true omitted genes than combining expression measures, for our data sets.

Show MeSH
Related in: MedlinePlus