Limits...
Investigation of reproducibility of differentially expressed genes in DNA microarrays through statistical simulation.

Fan X, Shi L, Fang H, Harris S, Perkins R, Tong W - BMC Proc (2009)

Bottom Line: In this study, we generate a set of simulated microarray datasets to compare gene selection/ranking rules, including P-value, FC and their combinations, using the percentage of overlapping genes between DEGs from two similar simulated datasets as the measure of reproducibility.The results are supportive of the MAQC's conclusion on that DEG lists are more reproducible across laboratories and platforms when FC-based ranking coupled with a nonstringent P-value cutoff is used for gene selection compared with selection based on P-value based ranking method.We conclude that the MAQC recommendation should be considered when reproducibility is an important study objective.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Center for Toxicological Research (NCTR), US Food and Drug Administration, 3900 NCTR Rd, Jefferson, AR 72079, USA. fanxh@zju.edu.cn

ABSTRACT
Recent publications have raised concerns about the reliability of microarray technology because of the lack of reproducibility of differentially expressed genes (DEGs) from highly similar studies across laboratories and platforms. The rat toxicogenomics study of the MicroArray Quality Control (MAQC) project empirically revealed that the DEGs selected using a fold change (FC)-based criterion were more reproducible than those derived solely by statistical significance such as P-value from a simple t-tests. In this study, we generate a set of simulated microarray datasets to compare gene selection/ranking rules, including P-value, FC and their combinations, using the percentage of overlapping genes between DEGs from two similar simulated datasets as the measure of reproducibility. The results are supportive of the MAQC's conclusion on that DEG lists are more reproducible across laboratories and platforms when FC-based ranking coupled with a nonstringent P-value cutoff is used for gene selection compared with selection based on P-value based ranking method. We conclude that the MAQC recommendation should be considered when reproducibility is an important study objective.

No MeSH data available.


The relationship of POG with the degree of difference in expression magnitude between the treated versus control groups. (A) Magnitude = 0.6; (B) Magnitude = 1.5; and (C) Magnitude = 0.2. The simulated datasets had CV = 30% and sample size = 50. The x-axis represents the number of genes selected as differentially expressed, and the y-axis represents the POG (%) of two gene lists for a given number of differentially expressed genes. Each line on the graph represents the overlap of differentially expressed gene lists based on one of six different gene ranking/selection methods. The red and blue numbers give the POG (%) when 500 genes (red dashed line) are selected as DEGs using P rank ordering only and FC rank ordering with P < 0.05, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2654487&req=5

Figure 2: The relationship of POG with the degree of difference in expression magnitude between the treated versus control groups. (A) Magnitude = 0.6; (B) Magnitude = 1.5; and (C) Magnitude = 0.2. The simulated datasets had CV = 30% and sample size = 50. The x-axis represents the number of genes selected as differentially expressed, and the y-axis represents the POG (%) of two gene lists for a given number of differentially expressed genes. Each line on the graph represents the overlap of differentially expressed gene lists based on one of six different gene ranking/selection methods. The red and blue numbers give the POG (%) when 500 genes (red dashed line) are selected as DEGs using P rank ordering only and FC rank ordering with P < 0.05, respectively.

Mentions: Figure 2 compares six gene selection methods on three datasets, each having a different magnitude level between the treated and control groups (i.e., FC = 1.5, 0.6 and 0.2). Similar to Figure 1, the FC-based methods resulted in greater reproducibility compared to the P-based method. Furthermore, POG increases with increasing differential expression magnitude for FC selection methods. However, this trend is not prominent for P value-based selection methods, where it seems that the trend is equivocal.


Investigation of reproducibility of differentially expressed genes in DNA microarrays through statistical simulation.

Fan X, Shi L, Fang H, Harris S, Perkins R, Tong W - BMC Proc (2009)

The relationship of POG with the degree of difference in expression magnitude between the treated versus control groups. (A) Magnitude = 0.6; (B) Magnitude = 1.5; and (C) Magnitude = 0.2. The simulated datasets had CV = 30% and sample size = 50. The x-axis represents the number of genes selected as differentially expressed, and the y-axis represents the POG (%) of two gene lists for a given number of differentially expressed genes. Each line on the graph represents the overlap of differentially expressed gene lists based on one of six different gene ranking/selection methods. The red and blue numbers give the POG (%) when 500 genes (red dashed line) are selected as DEGs using P rank ordering only and FC rank ordering with P < 0.05, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2654487&req=5

Figure 2: The relationship of POG with the degree of difference in expression magnitude between the treated versus control groups. (A) Magnitude = 0.6; (B) Magnitude = 1.5; and (C) Magnitude = 0.2. The simulated datasets had CV = 30% and sample size = 50. The x-axis represents the number of genes selected as differentially expressed, and the y-axis represents the POG (%) of two gene lists for a given number of differentially expressed genes. Each line on the graph represents the overlap of differentially expressed gene lists based on one of six different gene ranking/selection methods. The red and blue numbers give the POG (%) when 500 genes (red dashed line) are selected as DEGs using P rank ordering only and FC rank ordering with P < 0.05, respectively.
Mentions: Figure 2 compares six gene selection methods on three datasets, each having a different magnitude level between the treated and control groups (i.e., FC = 1.5, 0.6 and 0.2). Similar to Figure 1, the FC-based methods resulted in greater reproducibility compared to the P-based method. Furthermore, POG increases with increasing differential expression magnitude for FC selection methods. However, this trend is not prominent for P value-based selection methods, where it seems that the trend is equivocal.

Bottom Line: In this study, we generate a set of simulated microarray datasets to compare gene selection/ranking rules, including P-value, FC and their combinations, using the percentage of overlapping genes between DEGs from two similar simulated datasets as the measure of reproducibility.The results are supportive of the MAQC's conclusion on that DEG lists are more reproducible across laboratories and platforms when FC-based ranking coupled with a nonstringent P-value cutoff is used for gene selection compared with selection based on P-value based ranking method.We conclude that the MAQC recommendation should be considered when reproducibility is an important study objective.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Center for Toxicological Research (NCTR), US Food and Drug Administration, 3900 NCTR Rd, Jefferson, AR 72079, USA. fanxh@zju.edu.cn

ABSTRACT
Recent publications have raised concerns about the reliability of microarray technology because of the lack of reproducibility of differentially expressed genes (DEGs) from highly similar studies across laboratories and platforms. The rat toxicogenomics study of the MicroArray Quality Control (MAQC) project empirically revealed that the DEGs selected using a fold change (FC)-based criterion were more reproducible than those derived solely by statistical significance such as P-value from a simple t-tests. In this study, we generate a set of simulated microarray datasets to compare gene selection/ranking rules, including P-value, FC and their combinations, using the percentage of overlapping genes between DEGs from two similar simulated datasets as the measure of reproducibility. The results are supportive of the MAQC's conclusion on that DEG lists are more reproducible across laboratories and platforms when FC-based ranking coupled with a nonstringent P-value cutoff is used for gene selection compared with selection based on P-value based ranking method. We conclude that the MAQC recommendation should be considered when reproducibility is an important study objective.

No MeSH data available.