Limits...
Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods.

Chen C, Grennan K, Badner J, Zhang D, Gershon E, Jin L, Liu C - PLoS ONE (2011)

Bottom Line: However, the data produced by the thousands of microarray studies published annually are confounded by "batch effects," the systematic error introduced when samples are processed in multiple batches.Although batch effects can be reduced by careful experimental design, they cannot be eliminated unless the whole study is done in a single batch.We also showed that it is essential to standardize expression data at the probe level when testing for correlation of expression profiles, due to a sizeable probe effect in microarray data that can inflate the correlation among replicates and unrelated samples.

View Article: PubMed Central - PubMed

Affiliation: National Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, People's Republic of China.

ABSTRACT
The expression microarray is a frequently used approach to study gene expression on a genome-wide scale. However, the data produced by the thousands of microarray studies published annually are confounded by "batch effects," the systematic error introduced when samples are processed in multiple batches. Although batch effects can be reduced by careful experimental design, they cannot be eliminated unless the whole study is done in a single batch. A number of programs are now available to adjust microarray data for batch effects prior to analysis. We systematically evaluated six of these programs using multiple measures of precision, accuracy and overall performance. ComBat, an Empirical Bayes method, outperformed the other five programs by most metrics. We also showed that it is essential to standardize expression data at the probe level when testing for correlation of expression profiles, due to a sizeable probe effect in microarray data that can inflate the correlation among replicates and unrelated samples.

Show MeSH
ROC curves in AAS data.ROC curves are graphical representations of both specificity and sensitivity that take into account both differentially and non-differentially expressed genes. ComBat_p and ComBat_n performed almost identically, so their curves overlap each other almost completely.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3046121&req=5

pone-0017238-g005: ROC curves in AAS data.ROC curves are graphical representations of both specificity and sensitivity that take into account both differentially and non-differentially expressed genes. ComBat_p and ComBat_n performed almost identically, so their curves overlap each other almost completely.

Mentions: We used ROC curves to determine which program best optimized both sensitivity and specificity, i.e., maximized true positives (TP) while minimizing false positives (FP). To create an ROC curve, TP rate is plotted against FP rate; the actual test statistic is the area under the curve (AUC) [24] (Figure 5). The larger AUC, the better the program's performance. The AUC for the unadjusted data was 0.854. ComBat_p and ComBat_n increased the AUC (0.937, p = 4.51e−30, p = 1.42e−29, respectively), followed by DWD (0.917, p = 5.88e−15), PAMR (0.913, p = 2.25e−13), and Ratio_G (0.895, p = 1.20e−06). SVA did not increase the AUC significantly (0.858, p = 0.27) (Table S3, Row 15). The results were similar in the Affymetrix spike-in data, except that SVA actually decreased the AUC value, from 0.93 to 0.76 (p<0.0001) (Figure S4).


Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods.

Chen C, Grennan K, Badner J, Zhang D, Gershon E, Jin L, Liu C - PLoS ONE (2011)

ROC curves in AAS data.ROC curves are graphical representations of both specificity and sensitivity that take into account both differentially and non-differentially expressed genes. ComBat_p and ComBat_n performed almost identically, so their curves overlap each other almost completely.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3046121&req=5

pone-0017238-g005: ROC curves in AAS data.ROC curves are graphical representations of both specificity and sensitivity that take into account both differentially and non-differentially expressed genes. ComBat_p and ComBat_n performed almost identically, so their curves overlap each other almost completely.
Mentions: We used ROC curves to determine which program best optimized both sensitivity and specificity, i.e., maximized true positives (TP) while minimizing false positives (FP). To create an ROC curve, TP rate is plotted against FP rate; the actual test statistic is the area under the curve (AUC) [24] (Figure 5). The larger AUC, the better the program's performance. The AUC for the unadjusted data was 0.854. ComBat_p and ComBat_n increased the AUC (0.937, p = 4.51e−30, p = 1.42e−29, respectively), followed by DWD (0.917, p = 5.88e−15), PAMR (0.913, p = 2.25e−13), and Ratio_G (0.895, p = 1.20e−06). SVA did not increase the AUC significantly (0.858, p = 0.27) (Table S3, Row 15). The results were similar in the Affymetrix spike-in data, except that SVA actually decreased the AUC value, from 0.93 to 0.76 (p<0.0001) (Figure S4).

Bottom Line: However, the data produced by the thousands of microarray studies published annually are confounded by "batch effects," the systematic error introduced when samples are processed in multiple batches.Although batch effects can be reduced by careful experimental design, they cannot be eliminated unless the whole study is done in a single batch.We also showed that it is essential to standardize expression data at the probe level when testing for correlation of expression profiles, due to a sizeable probe effect in microarray data that can inflate the correlation among replicates and unrelated samples.

View Article: PubMed Central - PubMed

Affiliation: National Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, People's Republic of China.

ABSTRACT
The expression microarray is a frequently used approach to study gene expression on a genome-wide scale. However, the data produced by the thousands of microarray studies published annually are confounded by "batch effects," the systematic error introduced when samples are processed in multiple batches. Although batch effects can be reduced by careful experimental design, they cannot be eliminated unless the whole study is done in a single batch. A number of programs are now available to adjust microarray data for batch effects prior to analysis. We systematically evaluated six of these programs using multiple measures of precision, accuracy and overall performance. ComBat, an Empirical Bayes method, outperformed the other five programs by most metrics. We also showed that it is essential to standardize expression data at the probe level when testing for correlation of expression profiles, due to a sizeable probe effect in microarray data that can inflate the correlation among replicates and unrelated samples.

Show MeSH