Limits...
Integrative analysis of multiple gene expression profiles with quality-adjusted effect size models.

Hu P, Greenwood CM, Beyene J - BMC Bioinformatics (2005)

Bottom Line: We extended traditional effect size models to combine information from different microarray datasets by incorporating a quality measure for each gene in each study into the effect size estimation.We live in a high-throughput era where technologies constantly change leaving behind a trail of data with different forms, shapes and sizes.Statistical and computational methodologies are therefore critical for extracting the most out of these related but not identical sources of data.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Hospital for Sick Children Research Institute, 555 University Ave,, Toronto, ON, M5G 1X8, Canada. phu@sickkids.ca

ABSTRACT

Background: With the explosion of microarray studies, an enormous amount of data is being produced. Systematic integration of gene expression data from different sources increases statistical power of detecting differentially expressed genes and allows assessment of heterogeneity. The challenge, however, is in designing and implementing efficient analytic methodologies for combination of data generated by different research groups.

Results: We extended traditional effect size models to combine information from different microarray datasets by incorporating a quality measure for each gene in each study into the effect size estimation. We illustrated our method by integrating two datasets generated using different Affymetrix oligonucleotide types. Our results indicate that the proposed quality-adjusted weighting strategy for modelling inter-study variation of gene expression profiles not only increases consistency and decreases heterogeneous results between these two datasets, but also identifies many more differentially expressed genes than methods proposed previously.

Conclusion: Data integration and synthesis is becoming increasingly important. We live in a high-throughput era where technologies constantly change leaving behind a trail of data with different forms, shapes and sizes. Statistical and computational methodologies are therefore critical for extracting the most out of these related but not identical sources of data.

Show MeSH
Quantile – Quantile plots of the observed versus the expected Q statistic: (a) with quality adjustment, and (b) without quality adjustment.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC1173085&req=5

Figure 3: Quantile – Quantile plots of the observed versus the expected Q statistic: (a) with quality adjustment, and (b) without quality adjustment.

Mentions: Figure 3 shows the adjusted and unadjusted quantile – quantile (Q-Q) plots of the observed vs. expected Q values. Q is the test statistic we used for assessing heterogeneity, and is described in detail later in the Methods section. In the adjusted Q-Q plot, the quality score was used as a weight in the computation of Q while it was not considered in the unadjusted Q-Q plot. From these graphs, we can see that the quantiles of the observed Q values are far from the expected quantiles of a distribution, suggesting that these two datasets generated heterogeneous results beyond random sampling errors. Therefore, we applied the random effect model in this study. The quantiles of the Q statistic were closer to the quantiles of the expected chi-square distribution when quality-adjustment was considered (Figure 3(a)) than when it is was not (Figure 3(b)). The variance for the unadjusted Q values was 9.45, but it was reduced to 3.31 when quality adjustment was used. This result suggests that the incorporation of the adjusted quality measure into effect size estimation can increase consistency and decrease heterogeneity between these two datasets.


Integrative analysis of multiple gene expression profiles with quality-adjusted effect size models.

Hu P, Greenwood CM, Beyene J - BMC Bioinformatics (2005)

Quantile – Quantile plots of the observed versus the expected Q statistic: (a) with quality adjustment, and (b) without quality adjustment.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC1173085&req=5

Figure 3: Quantile – Quantile plots of the observed versus the expected Q statistic: (a) with quality adjustment, and (b) without quality adjustment.
Mentions: Figure 3 shows the adjusted and unadjusted quantile – quantile (Q-Q) plots of the observed vs. expected Q values. Q is the test statistic we used for assessing heterogeneity, and is described in detail later in the Methods section. In the adjusted Q-Q plot, the quality score was used as a weight in the computation of Q while it was not considered in the unadjusted Q-Q plot. From these graphs, we can see that the quantiles of the observed Q values are far from the expected quantiles of a distribution, suggesting that these two datasets generated heterogeneous results beyond random sampling errors. Therefore, we applied the random effect model in this study. The quantiles of the Q statistic were closer to the quantiles of the expected chi-square distribution when quality-adjustment was considered (Figure 3(a)) than when it is was not (Figure 3(b)). The variance for the unadjusted Q values was 9.45, but it was reduced to 3.31 when quality adjustment was used. This result suggests that the incorporation of the adjusted quality measure into effect size estimation can increase consistency and decrease heterogeneity between these two datasets.

Bottom Line: We extended traditional effect size models to combine information from different microarray datasets by incorporating a quality measure for each gene in each study into the effect size estimation.We live in a high-throughput era where technologies constantly change leaving behind a trail of data with different forms, shapes and sizes.Statistical and computational methodologies are therefore critical for extracting the most out of these related but not identical sources of data.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Hospital for Sick Children Research Institute, 555 University Ave,, Toronto, ON, M5G 1X8, Canada. phu@sickkids.ca

ABSTRACT

Background: With the explosion of microarray studies, an enormous amount of data is being produced. Systematic integration of gene expression data from different sources increases statistical power of detecting differentially expressed genes and allows assessment of heterogeneity. The challenge, however, is in designing and implementing efficient analytic methodologies for combination of data generated by different research groups.

Results: We extended traditional effect size models to combine information from different microarray datasets by incorporating a quality measure for each gene in each study into the effect size estimation. We illustrated our method by integrating two datasets generated using different Affymetrix oligonucleotide types. Our results indicate that the proposed quality-adjusted weighting strategy for modelling inter-study variation of gene expression profiles not only increases consistency and decreases heterogeneous results between these two datasets, but also identifies many more differentially expressed genes than methods proposed previously.

Conclusion: Data integration and synthesis is becoming increasingly important. We live in a high-throughput era where technologies constantly change leaving behind a trail of data with different forms, shapes and sizes. Statistical and computational methodologies are therefore critical for extracting the most out of these related but not identical sources of data.

Show MeSH