Limits...
Reference-free cell mixture adjustments in analysis of DNA methylation data.

Houseman EA, Molitor J, Marsit CJ - Bioinformatics (2014)

Bottom Line: Recently there has been increasing interest in the effects of cell mixture on the measurement of DNA methylation, specifically the extent to which small perturbations in cell mixture proportions can register as changes in DNA methylation.Software is available in the R package RefFreeEWAS.Data for three of four examples were obtained from Gene Expression Omnibus (GEO), accession numbers GSE37008, GSE42861 and GSE30601, while reference data were obtained from GEO accession number GSE39981. andres.houseman@oregonstate.edu Supplementary data are available at Bioinformatics online.

View Article: PubMed Central - PubMed

Affiliation: School of Biological and Population Health Sciences, College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, USA and Section of Biostatistics and Epidemiology, Department of Community and Family Medicine, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA.

ABSTRACT

Motivation: Recently there has been increasing interest in the effects of cell mixture on the measurement of DNA methylation, specifically the extent to which small perturbations in cell mixture proportions can register as changes in DNA methylation. A recently published set of statistical methods exploits this association to infer changes in cell mixture proportions, and these methods are presently being applied to adjust for cell mixture effect in the context of epigenome-wide association studies. However, these adjustments require the existence of reference datasets, which may be laborious or expensive to collect. For some tissues such as placenta, saliva, adipose or tumor tissue, the relevant underlying cell types may not be known.

Results: We propose a method for conducting epigenome-wide association studies analysis when a reference dataset is unavailable, including a bootstrap method for estimating standard errors. We demonstrate via simulation study and several real data analyses that our proposed method can perform as well as or better than methods that make explicit use of reference datasets. In particular, it may adjust for detailed cell type differences that may be unavailable even in existing reference datasets.

Availability and implementation: Software is available in the R package RefFreeEWAS. Data for three of four examples were obtained from Gene Expression Omnibus (GEO), accession numbers GSE37008, GSE42861 and GSE30601, while reference data were obtained from GEO accession number GSE39981.

Contact: andres.houseman@oregonstate.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Show MeSH

Related in: MedlinePlus

Simulation 1: estimated effect by true effect. Comparison of slope estimates: true direct effect () versus its estimate (), true direct effect versus the SVA-adjusted estimate and true direct effect () versus the unadjusted effect (). Squares indicate DMRs. Red indicates non- CpGs. Black squares represent non- DMRs
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4016702&req=5

btu029-F1: Simulation 1: estimated effect by true effect. Comparison of slope estimates: true direct effect () versus its estimate (), true direct effect versus the SVA-adjusted estimate and true direct effect () versus the unadjusted effect (). Squares indicate DMRs. Red indicates non- CpGs. Black squares represent non- DMRs

Mentions: For simulation #1, Figure 1 compares slope estimates versus , the SVA-adjusted variant of versus and versus , on each of the m = 1000 features. This figure demonstrates that although we expect the naive unadjusted estimator to provide an unbiased estimator of the total effect , its estimates of direct effects are somewhat biased in comparison to our proposed estimator , especially for the slopes. It also demonstrates that SVA produces biases similar to the unadjusted analysis. This figure is consistent with Table 2, which reports the total RMSE [e.g. the square root of the simulation average of ] for each of the four comparisons across all four scenarios. Table 2 suggests similar behavior for direct effects when the cell mixture effect is non- (simulation # 3). When the mixture effect is , and estimate with about the same precision, as one would anticipate. Under simulation scenario #1, for each of 1000 features, Figure 2 plots simulation SD versus median bootstrap estimate (across 100 simulations) of the direct effect estimator. The bootstrap procedure appears tolerably unbiased, although our bootstrap standard error estimator yields apparently inflated estimates for some DMRs and a handful of non-DMR CpGs having non- effect. The two very biased estimates result from intercepts lying near the zero boundary for mean methylation μ, resulting in non-linear effects (because of truncation) for some subjects having strongly negative values of x; in the Supplementary Material we provide evidence that the bias decreases in larger samples (Section III). Table 2 reports the median (over CpGs) of the ratio of median bootstrap standard error (over simulations) to simulation SD for all four scenarios, for both and , standard errors for the latter being computed using the standard linear model theory approach. Although the proposed bootstrap standard error methodology is imperfect, it appears to be as good as or better than the standard asymptotic methods used to compute standard errors for unadjusted effect estimates . Supplementary Material (Section III) provides plots similar to those provided in Figures 1 and 2 for simulation scenarios #2, #3 and #4; they are consistent with the results and interpretations given here.Fig. 1.


Reference-free cell mixture adjustments in analysis of DNA methylation data.

Houseman EA, Molitor J, Marsit CJ - Bioinformatics (2014)

Simulation 1: estimated effect by true effect. Comparison of slope estimates: true direct effect () versus its estimate (), true direct effect versus the SVA-adjusted estimate and true direct effect () versus the unadjusted effect (). Squares indicate DMRs. Red indicates non- CpGs. Black squares represent non- DMRs
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4016702&req=5

btu029-F1: Simulation 1: estimated effect by true effect. Comparison of slope estimates: true direct effect () versus its estimate (), true direct effect versus the SVA-adjusted estimate and true direct effect () versus the unadjusted effect (). Squares indicate DMRs. Red indicates non- CpGs. Black squares represent non- DMRs
Mentions: For simulation #1, Figure 1 compares slope estimates versus , the SVA-adjusted variant of versus and versus , on each of the m = 1000 features. This figure demonstrates that although we expect the naive unadjusted estimator to provide an unbiased estimator of the total effect , its estimates of direct effects are somewhat biased in comparison to our proposed estimator , especially for the slopes. It also demonstrates that SVA produces biases similar to the unadjusted analysis. This figure is consistent with Table 2, which reports the total RMSE [e.g. the square root of the simulation average of ] for each of the four comparisons across all four scenarios. Table 2 suggests similar behavior for direct effects when the cell mixture effect is non- (simulation # 3). When the mixture effect is , and estimate with about the same precision, as one would anticipate. Under simulation scenario #1, for each of 1000 features, Figure 2 plots simulation SD versus median bootstrap estimate (across 100 simulations) of the direct effect estimator. The bootstrap procedure appears tolerably unbiased, although our bootstrap standard error estimator yields apparently inflated estimates for some DMRs and a handful of non-DMR CpGs having non- effect. The two very biased estimates result from intercepts lying near the zero boundary for mean methylation μ, resulting in non-linear effects (because of truncation) for some subjects having strongly negative values of x; in the Supplementary Material we provide evidence that the bias decreases in larger samples (Section III). Table 2 reports the median (over CpGs) of the ratio of median bootstrap standard error (over simulations) to simulation SD for all four scenarios, for both and , standard errors for the latter being computed using the standard linear model theory approach. Although the proposed bootstrap standard error methodology is imperfect, it appears to be as good as or better than the standard asymptotic methods used to compute standard errors for unadjusted effect estimates . Supplementary Material (Section III) provides plots similar to those provided in Figures 1 and 2 for simulation scenarios #2, #3 and #4; they are consistent with the results and interpretations given here.Fig. 1.

Bottom Line: Recently there has been increasing interest in the effects of cell mixture on the measurement of DNA methylation, specifically the extent to which small perturbations in cell mixture proportions can register as changes in DNA methylation.Software is available in the R package RefFreeEWAS.Data for three of four examples were obtained from Gene Expression Omnibus (GEO), accession numbers GSE37008, GSE42861 and GSE30601, while reference data were obtained from GEO accession number GSE39981. andres.houseman@oregonstate.edu Supplementary data are available at Bioinformatics online.

View Article: PubMed Central - PubMed

Affiliation: School of Biological and Population Health Sciences, College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, USA and Section of Biostatistics and Epidemiology, Department of Community and Family Medicine, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA.

ABSTRACT

Motivation: Recently there has been increasing interest in the effects of cell mixture on the measurement of DNA methylation, specifically the extent to which small perturbations in cell mixture proportions can register as changes in DNA methylation. A recently published set of statistical methods exploits this association to infer changes in cell mixture proportions, and these methods are presently being applied to adjust for cell mixture effect in the context of epigenome-wide association studies. However, these adjustments require the existence of reference datasets, which may be laborious or expensive to collect. For some tissues such as placenta, saliva, adipose or tumor tissue, the relevant underlying cell types may not be known.

Results: We propose a method for conducting epigenome-wide association studies analysis when a reference dataset is unavailable, including a bootstrap method for estimating standard errors. We demonstrate via simulation study and several real data analyses that our proposed method can perform as well as or better than methods that make explicit use of reference datasets. In particular, it may adjust for detailed cell type differences that may be unavailable even in existing reference datasets.

Availability and implementation: Software is available in the R package RefFreeEWAS. Data for three of four examples were obtained from Gene Expression Omnibus (GEO), accession numbers GSE37008, GSE42861 and GSE30601, while reference data were obtained from GEO accession number GSE39981.

Contact: andres.houseman@oregonstate.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Show MeSH
Related in: MedlinePlus