Limits...
Empirical Bayes accomodation of batch-effects in microarray data using identical replicate reference samples: application to RNA expression profiling of blood from Duchenne muscular dystrophy patients.

Walker WL, Liao IH, Gilbert DL, Wong B, Pollard KS, McCulloch CE, Lit L, Sharp FR - BMC Genomics (2008)

Bottom Line: It is often impossible to compare groups of samples from independent experiments because batch effects confound true gene expression differences.We examine the effects of non-biological variation within a single experiment and between experiments.Batch correction has a significant impact on which genes are identified as differentially regulated.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Neurology and MIND Institute, University of California at Davis, Sacramento, California, USA. wwalker@ucdavis.edu

ABSTRACT

Background: Non-biological experimental error routinely occurs in microarray data collected in different batches. It is often impossible to compare groups of samples from independent experiments because batch effects confound true gene expression differences. Existing methods can correct for batch effects only when samples from all biological groups are represented in every batch.

Results: In this report we describe a generalized empirical Bayes approach to correct for cross-experimental batch effects, allowing direct comparisons of gene expression between biological groups from independent experiments. The proposed experimental design uses identical reference samples in each batch in every experiment. These reference samples are from the same tissue as the experimental samples. This design with tissue matched reference samples allows a gene-by-gene correction to be performed using fewer arrays than currently available methods. We examine the effects of non-biological variation within a single experiment and between experiments.

Conclusion: Batch correction has a significant impact on which genes are identified as differentially regulated. Using this method, gene expression in the blood of patients with Duchenne Muscular Dystrophy is shown to differ for hundreds of genes when compared to controls. The numbers of specific genes differ depending upon whether between experiment and/or between batch corrections are performed.

Show MeSH

Related in: MedlinePlus

Common Genes in lists of differentially expressed genes for three sets of gene expression values: (1) unadjusted, (2) t-test Filtered, and (3) Empirical Bayes adjusted data. There are 239 genes common to all three gene lists.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2576259&req=5

Figure 7: Common Genes in lists of differentially expressed genes for three sets of gene expression values: (1) unadjusted, (2) t-test Filtered, and (3) Empirical Bayes adjusted data. There are 239 genes common to all three gene lists.

Mentions: Figure 7 shows the genes identified as differentially expressed in patients versus controls for the different methods explored in this paper: empirical Bayes adjustment for within and between experiment variation (Model 1: 629 genes), t-test filtering (273 genes), and unadjusted data (527 genes). The relatively small number common to all three gene lists (239 genes) illustrates the substantial effect of correcting for batch effects. Nearly 90% (239 out of 273) of the genes identified by the t-test filter were also identified by the empirical Bayes method, which identifies a number of additional genes.


Empirical Bayes accomodation of batch-effects in microarray data using identical replicate reference samples: application to RNA expression profiling of blood from Duchenne muscular dystrophy patients.

Walker WL, Liao IH, Gilbert DL, Wong B, Pollard KS, McCulloch CE, Lit L, Sharp FR - BMC Genomics (2008)

Common Genes in lists of differentially expressed genes for three sets of gene expression values: (1) unadjusted, (2) t-test Filtered, and (3) Empirical Bayes adjusted data. There are 239 genes common to all three gene lists.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2576259&req=5

Figure 7: Common Genes in lists of differentially expressed genes for three sets of gene expression values: (1) unadjusted, (2) t-test Filtered, and (3) Empirical Bayes adjusted data. There are 239 genes common to all three gene lists.
Mentions: Figure 7 shows the genes identified as differentially expressed in patients versus controls for the different methods explored in this paper: empirical Bayes adjustment for within and between experiment variation (Model 1: 629 genes), t-test filtering (273 genes), and unadjusted data (527 genes). The relatively small number common to all three gene lists (239 genes) illustrates the substantial effect of correcting for batch effects. Nearly 90% (239 out of 273) of the genes identified by the t-test filter were also identified by the empirical Bayes method, which identifies a number of additional genes.

Bottom Line: It is often impossible to compare groups of samples from independent experiments because batch effects confound true gene expression differences.We examine the effects of non-biological variation within a single experiment and between experiments.Batch correction has a significant impact on which genes are identified as differentially regulated.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Neurology and MIND Institute, University of California at Davis, Sacramento, California, USA. wwalker@ucdavis.edu

ABSTRACT

Background: Non-biological experimental error routinely occurs in microarray data collected in different batches. It is often impossible to compare groups of samples from independent experiments because batch effects confound true gene expression differences. Existing methods can correct for batch effects only when samples from all biological groups are represented in every batch.

Results: In this report we describe a generalized empirical Bayes approach to correct for cross-experimental batch effects, allowing direct comparisons of gene expression between biological groups from independent experiments. The proposed experimental design uses identical reference samples in each batch in every experiment. These reference samples are from the same tissue as the experimental samples. This design with tissue matched reference samples allows a gene-by-gene correction to be performed using fewer arrays than currently available methods. We examine the effects of non-biological variation within a single experiment and between experiments.

Conclusion: Batch correction has a significant impact on which genes are identified as differentially regulated. Using this method, gene expression in the blood of patients with Duchenne Muscular Dystrophy is shown to differ for hundreds of genes when compared to controls. The numbers of specific genes differ depending upon whether between experiment and/or between batch corrections are performed.

Show MeSH
Related in: MedlinePlus