Limits...
A statistical model for the analysis of beta values in DNA methylation studies

View Article: PubMed Central - PubMed

ABSTRACT

Background: The analysis of DNA methylation is a key component in the development of personalized treatment approaches. A common way to measure DNA methylation is the calculation of beta values, which are bounded variables of the form M/(M+U) that are generated by Illumina’s 450k BeadChip array. The statistical analysis of beta values is considered to be challenging, as traditional methods for the analysis of bounded variables, such as M-value regression and beta regression, are based on regularity assumptions that are often too strong to adequately describe the distribution of beta values.

Results: We develop a statistical model for the analysis of beta values that is derived from a bivariate gamma distribution for the signal intensities M and U. By allowing for possible correlations between M and U, the proposed model explicitly takes into account the data-generating process underlying the calculation of beta values. Using simulated data and a real sample of DNA methylation data from the Heinz Nixdorf Recall cohort study, we demonstrate that the proposed model fits our data significantly better than beta regression and M-value regression.

Conclusion: The proposed model contributes to an improved identification of associations between beta values and covariates such as clinical variables and lifestyle factors in epigenome-wide association studies. It is as easy to apply to a sample of beta values as beta regression and M-value regression.

Electronic supplementary material: The online version of this article (doi:10.1186/s12859-016-1347-4) contains supplementary material, which is available to authorized users.

No MeSH data available.


Related in: MedlinePlus

Results obtained from the first part of the simulation study. The plots show the differences in the estimated rejection rates of the  hypothesis “ H0:γgender=0”, as obtained from the RCG model, beta regression, and M-value regression (10,000 simulation runs). The covariate values of the HNR Study (n= 1,118) were used to generate the linear predictors X⊤γ. Beta values were generated from the distribution of the ratio in (9) using the sample estimates at CpG site cg00786084. High levels of the black and blue lines correspond to a high power of the RCG-based tests. The vertical gray line refers to the  hypothesis H0
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC5120494&req=5

Fig2: Results obtained from the first part of the simulation study. The plots show the differences in the estimated rejection rates of the hypothesis “ H0:γgender=0”, as obtained from the RCG model, beta regression, and M-value regression (10,000 simulation runs). The covariate values of the HNR Study (n= 1,118) were used to generate the linear predictors X⊤γ. Beta values were generated from the distribution of the ratio in (9) using the sample estimates at CpG site cg00786084. High levels of the black and blue lines correspond to a high power of the RCG-based tests. The vertical gray line refers to the hypothesis H0

Mentions: Figure 2 shows the differences in the fractions of tests that rejected the hypothesis “ H0:γgender=0” at the 5 % level for varying values of γgender and ρ. It is seen that the RCG model performed better than beta and M-value regression, especially in situations where the effect size γgender took moderately high values. For large effect sizes, the power of the three models was similar. This result is explained by the fact that large effect sizes resulted in high rejection rates of the hypothesis “ H0:γgender=0” regardless of whether the correlation between signal intensities was taken into account or not. As expected, the differences between the RCG model and competing approaches increased with the value of ρ. At the same time, RCG-based type I error rates were close to the nominal level of significance (0.054,0.049,0.050 for ρ=0.2,0.5,0.93, respectively).Fig. 2


A statistical model for the analysis of beta values in DNA methylation studies
Results obtained from the first part of the simulation study. The plots show the differences in the estimated rejection rates of the  hypothesis “ H0:γgender=0”, as obtained from the RCG model, beta regression, and M-value regression (10,000 simulation runs). The covariate values of the HNR Study (n= 1,118) were used to generate the linear predictors X⊤γ. Beta values were generated from the distribution of the ratio in (9) using the sample estimates at CpG site cg00786084. High levels of the black and blue lines correspond to a high power of the RCG-based tests. The vertical gray line refers to the  hypothesis H0
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC5120494&req=5

Fig2: Results obtained from the first part of the simulation study. The plots show the differences in the estimated rejection rates of the hypothesis “ H0:γgender=0”, as obtained from the RCG model, beta regression, and M-value regression (10,000 simulation runs). The covariate values of the HNR Study (n= 1,118) were used to generate the linear predictors X⊤γ. Beta values were generated from the distribution of the ratio in (9) using the sample estimates at CpG site cg00786084. High levels of the black and blue lines correspond to a high power of the RCG-based tests. The vertical gray line refers to the hypothesis H0
Mentions: Figure 2 shows the differences in the fractions of tests that rejected the hypothesis “ H0:γgender=0” at the 5 % level for varying values of γgender and ρ. It is seen that the RCG model performed better than beta and M-value regression, especially in situations where the effect size γgender took moderately high values. For large effect sizes, the power of the three models was similar. This result is explained by the fact that large effect sizes resulted in high rejection rates of the hypothesis “ H0:γgender=0” regardless of whether the correlation between signal intensities was taken into account or not. As expected, the differences between the RCG model and competing approaches increased with the value of ρ. At the same time, RCG-based type I error rates were close to the nominal level of significance (0.054,0.049,0.050 for ρ=0.2,0.5,0.93, respectively).Fig. 2

View Article: PubMed Central - PubMed

ABSTRACT

Background: The analysis of DNA methylation is a key component in the development of personalized treatment approaches. A common way to measure DNA methylation is the calculation of beta values, which are bounded variables of the form M/(M+U) that are generated by Illumina’s 450k BeadChip array. The statistical analysis of beta values is considered to be challenging, as traditional methods for the analysis of bounded variables, such as M-value regression and beta regression, are based on regularity assumptions that are often too strong to adequately describe the distribution of beta values.

Results: We develop a statistical model for the analysis of beta values that is derived from a bivariate gamma distribution for the signal intensities M and U. By allowing for possible correlations between M and U, the proposed model explicitly takes into account the data-generating process underlying the calculation of beta values. Using simulated data and a real sample of DNA methylation data from the Heinz Nixdorf Recall cohort study, we demonstrate that the proposed model fits our data significantly better than beta regression and M-value regression.

Conclusion: The proposed model contributes to an improved identification of associations between beta values and covariates such as clinical variables and lifestyle factors in epigenome-wide association studies. It is as easy to apply to a sample of beta values as beta regression and M-value regression.

Electronic supplementary material: The online version of this article (doi:10.1186/s12859-016-1347-4) contains supplementary material, which is available to authorized users.

No MeSH data available.


Related in: MedlinePlus