Limits...
Error control variability in pathway-based microarray analysis.

Gold DL, Miecznikowski JC, Liu S - Bioinformatics (2009)

Bottom Line: In consideration of the variability in test results, we find that the widely used Benjamini and Hochberg's (BH) false discovery rate (FDR) analysis is less robust than alternative procedures.BH's error control requires a large number of hypothesis tests, a reasonable assumption for differential gene expression analysis, though not the case with pathway-based analysis.Therefore, we advocate through a series of simulations and applications to real gene expression data that researchers control the number of false positives rather than the FDR.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, Roswell Park Cancer, Buffalo, NY, USA. dlgold@buffalo.edu

ABSTRACT

Motivation: The decision to commit some or many false positives in practice rests with the investigator. Unfortunately, not all error control procedures perform the same. Our problem is to choose an error control procedure to determine a P-value threshold for identifying differentially expressed pathways in high-throughput gene expression studies. Pathway analysis involves fewer tests than differential gene expression analysis, on the order of a few hundred. We discuss and compare methods for error control for pathway analysis with gene expression data.

Results: In consideration of the variability in test results, we find that the widely used Benjamini and Hochberg's (BH) false discovery rate (FDR) analysis is less robust than alternative procedures. BH's error control requires a large number of hypothesis tests, a reasonable assumption for differential gene expression analysis, though not the case with pathway-based analysis. Therefore, we advocate through a series of simulations and applications to real gene expression data that researchers control the number of false positives rather than the FDR.

Show MeSH
Simulation of sampling variability in FDR and rFDR top row Simulation 1, bottom row Simulation 2, for FDR control level of 10%.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2734315&req=5

Figure 1: Simulation of sampling variability in FDR and rFDR top row Simulation 1, bottom row Simulation 2, for FDR control level of 10%.

Mentions: There is a discordance between the FDR and rFDR for pathway analysis. In Figure 1, the variability in the rFDR for both Simulations 1 and 2 is extreme, and appears to be quite discordant from the FDR control level. Researchers should find this troubling. The frequency histogram of , shows the sampling variability in the decision rule to reject. This variability can better be illustrated though the FDR function, as function of , not to be confused with the desired FDR level of control level which is a constant. In the results of Simulation 1, there is a spike in the probability that the , and a long right tail. Similar results are observed for Simulation 2. This indicates that the method has a tendency to be overly conservative for N = 250 tests. Summary statistics for each simulation are listed in Table 1 of Supplementary Materials, including results for BH control at α = 0.01. Note that for N = 5000, available in the Supplementary Materials, the discordance between FDR and rFDR disappears.Fig. 1.


Error control variability in pathway-based microarray analysis.

Gold DL, Miecznikowski JC, Liu S - Bioinformatics (2009)

Simulation of sampling variability in FDR and rFDR top row Simulation 1, bottom row Simulation 2, for FDR control level of 10%.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2734315&req=5

Figure 1: Simulation of sampling variability in FDR and rFDR top row Simulation 1, bottom row Simulation 2, for FDR control level of 10%.
Mentions: There is a discordance between the FDR and rFDR for pathway analysis. In Figure 1, the variability in the rFDR for both Simulations 1 and 2 is extreme, and appears to be quite discordant from the FDR control level. Researchers should find this troubling. The frequency histogram of , shows the sampling variability in the decision rule to reject. This variability can better be illustrated though the FDR function, as function of , not to be confused with the desired FDR level of control level which is a constant. In the results of Simulation 1, there is a spike in the probability that the , and a long right tail. Similar results are observed for Simulation 2. This indicates that the method has a tendency to be overly conservative for N = 250 tests. Summary statistics for each simulation are listed in Table 1 of Supplementary Materials, including results for BH control at α = 0.01. Note that for N = 5000, available in the Supplementary Materials, the discordance between FDR and rFDR disappears.Fig. 1.

Bottom Line: In consideration of the variability in test results, we find that the widely used Benjamini and Hochberg's (BH) false discovery rate (FDR) analysis is less robust than alternative procedures.BH's error control requires a large number of hypothesis tests, a reasonable assumption for differential gene expression analysis, though not the case with pathway-based analysis.Therefore, we advocate through a series of simulations and applications to real gene expression data that researchers control the number of false positives rather than the FDR.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, Roswell Park Cancer, Buffalo, NY, USA. dlgold@buffalo.edu

ABSTRACT

Motivation: The decision to commit some or many false positives in practice rests with the investigator. Unfortunately, not all error control procedures perform the same. Our problem is to choose an error control procedure to determine a P-value threshold for identifying differentially expressed pathways in high-throughput gene expression studies. Pathway analysis involves fewer tests than differential gene expression analysis, on the order of a few hundred. We discuss and compare methods for error control for pathway analysis with gene expression data.

Results: In consideration of the variability in test results, we find that the widely used Benjamini and Hochberg's (BH) false discovery rate (FDR) analysis is less robust than alternative procedures. BH's error control requires a large number of hypothesis tests, a reasonable assumption for differential gene expression analysis, though not the case with pathway-based analysis. Therefore, we advocate through a series of simulations and applications to real gene expression data that researchers control the number of false positives rather than the FDR.

Show MeSH