Limits...
Microarray background correction: maximum likelihood estimation for the normal-exponential convolution.

Silver JD, Ritchie ME, Smyth GK - Biostatistics (2008)

Bottom Line: Using a saddle-point approximation, Ritchie and others (2007) found normexp to be the best background correction method for 2-color microarray data.A complete mathematical development is given of the normexp model and the associated saddle-point approximation.The performance of normexp for assessing differential expression is improved by adding a small offset to the corrected intensities.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Division, Walter and Eliza Hall Institute, Parkville 3050, Victoria, Australia. j.silver@biostat.ku.dk

ABSTRACT
Background correction is an important preprocessing step for microarray data that attempts to adjust the data for the ambient intensity surrounding each feature. The "normexp" method models the observed pixel intensities as the sum of 2 random variables, one normally distributed and the other exponentially distributed, representing background noise and signal, respectively. Using a saddle-point approximation, Ritchie and others (2007) found normexp to be the best background correction method for 2-color microarray data. This article develops the normexp method further by improving the estimation of the parameters. A complete mathematical development is given of the normexp model and the associated saddle-point approximation. Some subtle numerical programming issues are solved which caused the original normexp method to fail occasionally when applied to unusual data sets. A practical and reliable algorithm is developed for exact maximum likelihood estimation (MLE) using high-quality optimization software and using the saddle-point estimates as starting values. "MLE" is shown to outperform heuristic estimators proposed by other authors, both in terms of estimation accuracy and in terms of performance on real data. The saddle-point approximation is an adequate replacement in most practical situations. The performance of normexp for assessing differential expression is improved by adding a small offset to the corrected intensities.

Show MeSH
Number of false discoveries from the mixture data set using moderated t-statistics from (a) limma and (b) SAM. Each curve is an average over the 5 mixtures.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2648902&req=5

fig4: Number of false discoveries from the mixture data set using moderated t-statistics from (a) limma and (b) SAM. Each curve is an average over the 5 mixtures.

Mentions: Figure 4 shows the number of false discoveries for each method versus the number of genes selected by ranking the genes using absolute t-statistics, from largest to smallest for (a) limma and (b) SAM. The curves have been averaged over the 5 dye-swap pairs. The limma curves show that adding an offset reduces the false-discovery rate, with the best performance achieved by MLE and saddle, followed by RMA-75 and then RMA. For SAM, the advantage of “MLE + offset” and “saddle + offset” over the methods is even more marked. SAM appears to penalize the methods which do not stabilize the variance.


Microarray background correction: maximum likelihood estimation for the normal-exponential convolution.

Silver JD, Ritchie ME, Smyth GK - Biostatistics (2008)

Number of false discoveries from the mixture data set using moderated t-statistics from (a) limma and (b) SAM. Each curve is an average over the 5 mixtures.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2648902&req=5

fig4: Number of false discoveries from the mixture data set using moderated t-statistics from (a) limma and (b) SAM. Each curve is an average over the 5 mixtures.
Mentions: Figure 4 shows the number of false discoveries for each method versus the number of genes selected by ranking the genes using absolute t-statistics, from largest to smallest for (a) limma and (b) SAM. The curves have been averaged over the 5 dye-swap pairs. The limma curves show that adding an offset reduces the false-discovery rate, with the best performance achieved by MLE and saddle, followed by RMA-75 and then RMA. For SAM, the advantage of “MLE + offset” and “saddle + offset” over the methods is even more marked. SAM appears to penalize the methods which do not stabilize the variance.

Bottom Line: Using a saddle-point approximation, Ritchie and others (2007) found normexp to be the best background correction method for 2-color microarray data.A complete mathematical development is given of the normexp model and the associated saddle-point approximation.The performance of normexp for assessing differential expression is improved by adding a small offset to the corrected intensities.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Division, Walter and Eliza Hall Institute, Parkville 3050, Victoria, Australia. j.silver@biostat.ku.dk

ABSTRACT
Background correction is an important preprocessing step for microarray data that attempts to adjust the data for the ambient intensity surrounding each feature. The "normexp" method models the observed pixel intensities as the sum of 2 random variables, one normally distributed and the other exponentially distributed, representing background noise and signal, respectively. Using a saddle-point approximation, Ritchie and others (2007) found normexp to be the best background correction method for 2-color microarray data. This article develops the normexp method further by improving the estimation of the parameters. A complete mathematical development is given of the normexp model and the associated saddle-point approximation. Some subtle numerical programming issues are solved which caused the original normexp method to fail occasionally when applied to unusual data sets. A practical and reliable algorithm is developed for exact maximum likelihood estimation (MLE) using high-quality optimization software and using the saddle-point estimates as starting values. "MLE" is shown to outperform heuristic estimators proposed by other authors, both in terms of estimation accuracy and in terms of performance on real data. The saddle-point approximation is an adequate replacement in most practical situations. The performance of normexp for assessing differential expression is improved by adding a small offset to the corrected intensities.

Show MeSH