Limits...
Assessing differential expression in two-color microarrays: a resampling-based empirical Bayes approach.

Li D, Le Pape MA, Parikh NI, Chen WX, Dye TD - PLoS ONE (2013)

Bottom Line: The Significance Analysis of Microarrays, another widely used microarray data analysis method, is based on a permutation test and is robust to non-normally distributed data; however, the Significance Analysis of Microarrays method fold change criteria are problematic, and can critically alter the conclusion of a study, as a result of compositional changes of the control data set in the analysis.The results show that the Resampling-based empirical Bayes Methods offer significantly higher specificity and lower false discovery rates compared to Smyth's parametric method when data are not normally distributed.The Resampling-based empirical Bayes Methods also offers higher statistical power than the Significance Analysis of Microarrays method when the proportion of significantly differentially expressed genes is large for both normally and non-normally distributed data.

View Article: PubMed Central - PubMed

Affiliation: Office of Public Health Studies, John A. Burns School of Medicine, University of Hawaii at Manoa, Honolulu, Hawaii, United States of America.

ABSTRACT
Microarrays are widely used for examining differential gene expression, identifying single nucleotide polymorphisms, and detecting methylation loci. Multiple testing methods in microarray data analysis aim at controlling both Type I and Type II error rates; however, real microarray data do not always fit their distribution assumptions. Smyth's ubiquitous parametric method, for example, inadequately accommodates violations of normality assumptions, resulting in inflated Type I error rates. The Significance Analysis of Microarrays, another widely used microarray data analysis method, is based on a permutation test and is robust to non-normally distributed data; however, the Significance Analysis of Microarrays method fold change criteria are problematic, and can critically alter the conclusion of a study, as a result of compositional changes of the control data set in the analysis. We propose a novel approach, combining resampling with empirical Bayes methods: the Resampling-based empirical Bayes Methods. This approach not only reduces false discovery rates for non-normally distributed microarray data, but it is also impervious to fold change threshold since no control data set selection is needed. Through simulation studies, sensitivities, specificities, total rejections, and false discovery rates are compared across the Smyth's parametric method, the Significance Analysis of Microarrays, and the Resampling-based empirical Bayes Methods. Differences in false discovery rates controls between each approach are illustrated through a preterm delivery methylation study. The results show that the Resampling-based empirical Bayes Methods offer significantly higher specificity and lower false discovery rates compared to Smyth's parametric method when data are not normally distributed. The Resampling-based empirical Bayes Methods also offers higher statistical power than the Significance Analysis of Microarrays method when the proportion of significantly differentially expressed genes is large for both normally and non-normally distributed data. Finally, the Resampling-based empirical Bayes Methods are generalizable to next generation sequencing RNA-seq data analysis.

Show MeSH

Related in: MedlinePlus

Sensitivity, specificity, total rejection, and estimated false discovery rate comparisons between the RBMs and the PM for normal distributed gene expression data.Blue: PM; Grey: SAM; Red: RBM test statistic based permutation method; Orange: RBM -value based permutation method; Green: RBM test statistic based bootstrap method; Purple: RBM -value based bootstrap method. Figure 2a: sample size n = 4 in each group; Figure 2b: sample size n = 6 in each group; Figure 2c: sample size n = 12 in each group; Figure 2d: sample size n = 24 in each group; Figure 2e: sample size n = 48 in each group.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3842292&req=5

pone-0080099-g002: Sensitivity, specificity, total rejection, and estimated false discovery rate comparisons between the RBMs and the PM for normal distributed gene expression data.Blue: PM; Grey: SAM; Red: RBM test statistic based permutation method; Orange: RBM -value based permutation method; Green: RBM test statistic based bootstrap method; Purple: RBM -value based bootstrap method. Figure 2a: sample size n = 4 in each group; Figure 2b: sample size n = 6 in each group; Figure 2c: sample size n = 12 in each group; Figure 2d: sample size n = 24 in each group; Figure 2e: sample size n = 48 in each group.

Mentions: In terms of sensitivity (power), the PM shows very high sensitivity across all sample sizes and higher sensitivity than all other methods - even when sample size is small, e.g., 4 or 6 in each group (Figure 2a, b and Table 2). Both the SAM and the PBB has lower sensitivity compared to other methods for small sample sizes. However, the sensitivity improves significantly as sample size in each group increases for the PBB method, but not for the SAM method. The SAM method shows low sensitivity when the proportion of differentially expressed genes is over 50% - regardless of sample size (Figure 2 and Table 2). All other RBM methods show good sensitivity levels, comparable to the PM method when in each group (Figure 2b, 2c, 2d, 2e and Table 2).


Assessing differential expression in two-color microarrays: a resampling-based empirical Bayes approach.

Li D, Le Pape MA, Parikh NI, Chen WX, Dye TD - PLoS ONE (2013)

Sensitivity, specificity, total rejection, and estimated false discovery rate comparisons between the RBMs and the PM for normal distributed gene expression data.Blue: PM; Grey: SAM; Red: RBM test statistic based permutation method; Orange: RBM -value based permutation method; Green: RBM test statistic based bootstrap method; Purple: RBM -value based bootstrap method. Figure 2a: sample size n = 4 in each group; Figure 2b: sample size n = 6 in each group; Figure 2c: sample size n = 12 in each group; Figure 2d: sample size n = 24 in each group; Figure 2e: sample size n = 48 in each group.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3842292&req=5

pone-0080099-g002: Sensitivity, specificity, total rejection, and estimated false discovery rate comparisons between the RBMs and the PM for normal distributed gene expression data.Blue: PM; Grey: SAM; Red: RBM test statistic based permutation method; Orange: RBM -value based permutation method; Green: RBM test statistic based bootstrap method; Purple: RBM -value based bootstrap method. Figure 2a: sample size n = 4 in each group; Figure 2b: sample size n = 6 in each group; Figure 2c: sample size n = 12 in each group; Figure 2d: sample size n = 24 in each group; Figure 2e: sample size n = 48 in each group.
Mentions: In terms of sensitivity (power), the PM shows very high sensitivity across all sample sizes and higher sensitivity than all other methods - even when sample size is small, e.g., 4 or 6 in each group (Figure 2a, b and Table 2). Both the SAM and the PBB has lower sensitivity compared to other methods for small sample sizes. However, the sensitivity improves significantly as sample size in each group increases for the PBB method, but not for the SAM method. The SAM method shows low sensitivity when the proportion of differentially expressed genes is over 50% - regardless of sample size (Figure 2 and Table 2). All other RBM methods show good sensitivity levels, comparable to the PM method when in each group (Figure 2b, 2c, 2d, 2e and Table 2).

Bottom Line: The Significance Analysis of Microarrays, another widely used microarray data analysis method, is based on a permutation test and is robust to non-normally distributed data; however, the Significance Analysis of Microarrays method fold change criteria are problematic, and can critically alter the conclusion of a study, as a result of compositional changes of the control data set in the analysis.The results show that the Resampling-based empirical Bayes Methods offer significantly higher specificity and lower false discovery rates compared to Smyth's parametric method when data are not normally distributed.The Resampling-based empirical Bayes Methods also offers higher statistical power than the Significance Analysis of Microarrays method when the proportion of significantly differentially expressed genes is large for both normally and non-normally distributed data.

View Article: PubMed Central - PubMed

Affiliation: Office of Public Health Studies, John A. Burns School of Medicine, University of Hawaii at Manoa, Honolulu, Hawaii, United States of America.

ABSTRACT
Microarrays are widely used for examining differential gene expression, identifying single nucleotide polymorphisms, and detecting methylation loci. Multiple testing methods in microarray data analysis aim at controlling both Type I and Type II error rates; however, real microarray data do not always fit their distribution assumptions. Smyth's ubiquitous parametric method, for example, inadequately accommodates violations of normality assumptions, resulting in inflated Type I error rates. The Significance Analysis of Microarrays, another widely used microarray data analysis method, is based on a permutation test and is robust to non-normally distributed data; however, the Significance Analysis of Microarrays method fold change criteria are problematic, and can critically alter the conclusion of a study, as a result of compositional changes of the control data set in the analysis. We propose a novel approach, combining resampling with empirical Bayes methods: the Resampling-based empirical Bayes Methods. This approach not only reduces false discovery rates for non-normally distributed microarray data, but it is also impervious to fold change threshold since no control data set selection is needed. Through simulation studies, sensitivities, specificities, total rejections, and false discovery rates are compared across the Smyth's parametric method, the Significance Analysis of Microarrays, and the Resampling-based empirical Bayes Methods. Differences in false discovery rates controls between each approach are illustrated through a preterm delivery methylation study. The results show that the Resampling-based empirical Bayes Methods offer significantly higher specificity and lower false discovery rates compared to Smyth's parametric method when data are not normally distributed. The Resampling-based empirical Bayes Methods also offers higher statistical power than the Significance Analysis of Microarrays method when the proportion of significantly differentially expressed genes is large for both normally and non-normally distributed data. Finally, the Resampling-based empirical Bayes Methods are generalizable to next generation sequencing RNA-seq data analysis.

Show MeSH
Related in: MedlinePlus