Limits...
Exploratory differential gene expression analysis in microarray experiments with no or limited replication.

Loguinov AV, Mian IS, Vulpe CD - Genome Biol. (2004)

Bottom Line: We describe an exploratory, data-oriented approach for identifying candidates for differential gene expression in cDNA microarray experiments in terms of alpha-outliers and outlier regions, using simultaneous tolerance intervals relative to the line of equivalence (Cy5 = Cy3).We demonstrate the improved performance of our approach over existing single-slide methods using public datasets and simulation studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Nutritional Sciences and Toxicology, University of California at Berkeley, Morgan Hall, Berkeley, CA 94720, USA. Avl53@aol.com

ABSTRACT
We describe an exploratory, data-oriented approach for identifying candidates for differential gene expression in cDNA microarray experiments in terms of alpha-outliers and outlier regions, using simultaneous tolerance intervals relative to the line of equivalence (Cy5 = Cy3). We demonstrate the improved performance of our approach over existing single-slide methods using public datasets and simulation studies.

Show MeSH
Interpreting q-values and calibrating q-value cut-offs. Four plots to facilitate q-value interpretation and calibrate the q-value cut-off [45,46] using the function qplot(). (a) The estimated portion of the true  hypotheses (π0) versus the tuning parameter λ ('bootstrap' method is used for automatically choosing λ by the software and π0 estimate is 0.978). (b) The expected proportion of false positives (q-value) for different p-value cut-offs. (c) The number of significant candidates for differential expression for each q-value. (d) The expected portion of false positives as a function of the number of candidates for differential expression called significant. The dotted black line in (a) is π0 approximation using bootstrap method; the dotted color lines in (b) (green for expected false positives (FP) on average < 1 and red for < 0.1) are used to match q- and p-value levels (0.011 and 0.0016 for expected FT < 1, 0.0015 and 0.000015 for expected FT < 0.1, correspondingly) for the expected FP cut-offs; the dotted color lines in (c) are used to match q-value cut-offs (0.011 and 0.0015) and the number of significant tests on average (83 and 59); the dotted color lines in (d) are used to match expected FP cut-offs (< 1 and < 0.1) and the number of significant tests on average (83 and 59, correspondingly).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC395768&req=5

Figure 24: Interpreting q-values and calibrating q-value cut-offs. Four plots to facilitate q-value interpretation and calibrate the q-value cut-off [45,46] using the function qplot(). (a) The estimated portion of the true hypotheses (π0) versus the tuning parameter λ ('bootstrap' method is used for automatically choosing λ by the software and π0 estimate is 0.978). (b) The expected proportion of false positives (q-value) for different p-value cut-offs. (c) The number of significant candidates for differential expression for each q-value. (d) The expected portion of false positives as a function of the number of candidates for differential expression called significant. The dotted black line in (a) is π0 approximation using bootstrap method; the dotted color lines in (b) (green for expected false positives (FP) on average < 1 and red for < 0.1) are used to match q- and p-value levels (0.011 and 0.0016 for expected FT < 1, 0.0015 and 0.000015 for expected FT < 0.1, correspondingly) for the expected FP cut-offs; the dotted color lines in (c) are used to match q-value cut-offs (0.011 and 0.0015) and the number of significant tests on average (83 and 59); the dotted color lines in (d) are used to match expected FP cut-offs (< 1 and < 0.1) and the number of significant tests on average (83 and 59, correspondingly).

Mentions: Typically, microarray data involve thousands of genes so clearly there is a problem of multiplicity of comparisons. Other model-based single-slide approaches do not consider this issue explicitly (see single-slide procedures described in [1,13,14,17,18]). First, we identify candidate outliers without correction to obtain unadjusted p-values (Table 3). A p-value is a probability to reject the hypothesis when the hypothesis is true and represents a measure of statistical significance in terms of false positive rate. One way to obtain adjusted p-values is to apply a Bonferroni correction based on N (the sample size of the entire dataset) which may be too conservative, so we examine two alternative corrections. In one alternative approach, we apply a multiplicity of comparison correction based on an estimate of k (number of non-regular observations) rather than the sample size of the entire dataset. This approach emphasizes stable outliers at the expense of other possible outliers (that is, N-k) which are inliers in the current single-slide experiment. Clearly, this Bonferroni correction by k provides a much less conservative result than the correction by N and we would argue more reasonable correction to identify true outliers. Other robust exploratory tools (see Methods) can be used to estimate k. In a more sophisticated approach to address these issues, the q-value is calculated from the ordered list of unadjusted p-values [45,46] (Figure 24). The q-value is the minimum false discovery rate [47] for a particular feature from a list of all features [45,46]. The false discovery rate is the proportion of true hypotheses among all hypotheses which were found to be significant - for example, a false discovery rate of 1% means that among all candidates for differential expression found significant, 1% of these are true s on average [46].


Exploratory differential gene expression analysis in microarray experiments with no or limited replication.

Loguinov AV, Mian IS, Vulpe CD - Genome Biol. (2004)

Interpreting q-values and calibrating q-value cut-offs. Four plots to facilitate q-value interpretation and calibrate the q-value cut-off [45,46] using the function qplot(). (a) The estimated portion of the true  hypotheses (π0) versus the tuning parameter λ ('bootstrap' method is used for automatically choosing λ by the software and π0 estimate is 0.978). (b) The expected proportion of false positives (q-value) for different p-value cut-offs. (c) The number of significant candidates for differential expression for each q-value. (d) The expected portion of false positives as a function of the number of candidates for differential expression called significant. The dotted black line in (a) is π0 approximation using bootstrap method; the dotted color lines in (b) (green for expected false positives (FP) on average < 1 and red for < 0.1) are used to match q- and p-value levels (0.011 and 0.0016 for expected FT < 1, 0.0015 and 0.000015 for expected FT < 0.1, correspondingly) for the expected FP cut-offs; the dotted color lines in (c) are used to match q-value cut-offs (0.011 and 0.0015) and the number of significant tests on average (83 and 59); the dotted color lines in (d) are used to match expected FP cut-offs (< 1 and < 0.1) and the number of significant tests on average (83 and 59, correspondingly).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC395768&req=5

Figure 24: Interpreting q-values and calibrating q-value cut-offs. Four plots to facilitate q-value interpretation and calibrate the q-value cut-off [45,46] using the function qplot(). (a) The estimated portion of the true hypotheses (π0) versus the tuning parameter λ ('bootstrap' method is used for automatically choosing λ by the software and π0 estimate is 0.978). (b) The expected proportion of false positives (q-value) for different p-value cut-offs. (c) The number of significant candidates for differential expression for each q-value. (d) The expected portion of false positives as a function of the number of candidates for differential expression called significant. The dotted black line in (a) is π0 approximation using bootstrap method; the dotted color lines in (b) (green for expected false positives (FP) on average < 1 and red for < 0.1) are used to match q- and p-value levels (0.011 and 0.0016 for expected FT < 1, 0.0015 and 0.000015 for expected FT < 0.1, correspondingly) for the expected FP cut-offs; the dotted color lines in (c) are used to match q-value cut-offs (0.011 and 0.0015) and the number of significant tests on average (83 and 59); the dotted color lines in (d) are used to match expected FP cut-offs (< 1 and < 0.1) and the number of significant tests on average (83 and 59, correspondingly).
Mentions: Typically, microarray data involve thousands of genes so clearly there is a problem of multiplicity of comparisons. Other model-based single-slide approaches do not consider this issue explicitly (see single-slide procedures described in [1,13,14,17,18]). First, we identify candidate outliers without correction to obtain unadjusted p-values (Table 3). A p-value is a probability to reject the hypothesis when the hypothesis is true and represents a measure of statistical significance in terms of false positive rate. One way to obtain adjusted p-values is to apply a Bonferroni correction based on N (the sample size of the entire dataset) which may be too conservative, so we examine two alternative corrections. In one alternative approach, we apply a multiplicity of comparison correction based on an estimate of k (number of non-regular observations) rather than the sample size of the entire dataset. This approach emphasizes stable outliers at the expense of other possible outliers (that is, N-k) which are inliers in the current single-slide experiment. Clearly, this Bonferroni correction by k provides a much less conservative result than the correction by N and we would argue more reasonable correction to identify true outliers. Other robust exploratory tools (see Methods) can be used to estimate k. In a more sophisticated approach to address these issues, the q-value is calculated from the ordered list of unadjusted p-values [45,46] (Figure 24). The q-value is the minimum false discovery rate [47] for a particular feature from a list of all features [45,46]. The false discovery rate is the proportion of true hypotheses among all hypotheses which were found to be significant - for example, a false discovery rate of 1% means that among all candidates for differential expression found significant, 1% of these are true s on average [46].

Bottom Line: We describe an exploratory, data-oriented approach for identifying candidates for differential gene expression in cDNA microarray experiments in terms of alpha-outliers and outlier regions, using simultaneous tolerance intervals relative to the line of equivalence (Cy5 = Cy3).We demonstrate the improved performance of our approach over existing single-slide methods using public datasets and simulation studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Nutritional Sciences and Toxicology, University of California at Berkeley, Morgan Hall, Berkeley, CA 94720, USA. Avl53@aol.com

ABSTRACT
We describe an exploratory, data-oriented approach for identifying candidates for differential gene expression in cDNA microarray experiments in terms of alpha-outliers and outlier regions, using simultaneous tolerance intervals relative to the line of equivalence (Cy5 = Cy3). We demonstrate the improved performance of our approach over existing single-slide methods using public datasets and simulation studies.

Show MeSH