Limits...
A mixture model approach to multiple testing for the genetic analysis of gene expression.

Dalmasso C, Pickrell J, Tuefferd M, Génin E, Bourgain C, Broët P - BMC Proc (2007)

Bottom Line: Here, we propose a finite mixture model to estimate the local FDR (lFDR), the FDR, and the false non-discovery rate (FNR) in variance-component linkage analysis.Our parametric approach allows empirical estimation of an appropriate distribution.The contribution of our model to estimation of FDR and related criteria is illustrated on the microarray expression profiles data set provided by the Genetic Analysis Workshop 15 Problem 1.

View Article: PubMed Central - HTML - PubMed

Affiliation: JE 2492 Universite Paris-Sud, Hôpital Paul Brousse - Batiment 15/16, 16 Avenue Paul Vaillant Couturier, Villejuif CEDEX 94807, France. dalmasso@vjf.inserm.fr

ABSTRACT
With the availability of very dense genome-wide maps of markers, multiple testing has become a major difficulty for genetic studies. In this context, the false-discovery rate (FDR) and related criteria are widely used. Here, we propose a finite mixture model to estimate the local FDR (lFDR), the FDR, and the false non-discovery rate (FNR) in variance-component linkage analysis. Our parametric approach allows empirical estimation of an appropriate distribution. The contribution of our model to estimation of FDR and related criteria is illustrated on the microarray expression profiles data set provided by the Genetic Analysis Workshop 15 Problem 1.

No MeSH data available.


Histogram distribution of the (non-) observed likelihood ratio statistic, theoretical  hypothesis density, and marginal and  hypothesis densities estimated from the mixture model for the DDX17 gene.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2367480&req=5

Figure 1: Histogram distribution of the (non-) observed likelihood ratio statistic, theoretical hypothesis density, and marginal and hypothesis densities estimated from the mixture model for the DDX17 gene.

Mentions: Table 1 gives the estimated parameters of the two-component mixture model for the expression of each of the 10 genes (phenotypes). The estimated values of the distribution parameters differed markedly from the theoretical values. For the 10 selected genes, the maximal differences between the theoretical and empirical values were: 0.11 for θ (PSPHL), 1.96 for α1 (DDX17), and 1.08 for β1 (ALG6). For example, Figure 1 illustrates the histogram distribution of the (non-) observed likelihood-ratio statistic X, and superimposed theoretical hypothesis, marginal and hypothesis densities estimated from the mixture model for the DDX17 gene. The marked difference between the theoretical and estimated distributions strongly supports the use of the estimated distribution rather than the theoretical one. As noted by Efron [8], these differences can substantially affect any simultaneous inference (including FDR estimation and FWER control). It is worth noting that when the FWER is controlled at 5% with a classical Bonferroni procedure, the p-values for the DDX17 gene calculated from the theoretical distribution yielded 52 significant results, while the p-values calculated from the estimated distribution gave only 13 significant results. In this example, considering the theoretical distribution clearly tended to overestimate the number of significant results.


A mixture model approach to multiple testing for the genetic analysis of gene expression.

Dalmasso C, Pickrell J, Tuefferd M, Génin E, Bourgain C, Broët P - BMC Proc (2007)

Histogram distribution of the (non-) observed likelihood ratio statistic, theoretical  hypothesis density, and marginal and  hypothesis densities estimated from the mixture model for the DDX17 gene.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2367480&req=5

Figure 1: Histogram distribution of the (non-) observed likelihood ratio statistic, theoretical hypothesis density, and marginal and hypothesis densities estimated from the mixture model for the DDX17 gene.
Mentions: Table 1 gives the estimated parameters of the two-component mixture model for the expression of each of the 10 genes (phenotypes). The estimated values of the distribution parameters differed markedly from the theoretical values. For the 10 selected genes, the maximal differences between the theoretical and empirical values were: 0.11 for θ (PSPHL), 1.96 for α1 (DDX17), and 1.08 for β1 (ALG6). For example, Figure 1 illustrates the histogram distribution of the (non-) observed likelihood-ratio statistic X, and superimposed theoretical hypothesis, marginal and hypothesis densities estimated from the mixture model for the DDX17 gene. The marked difference between the theoretical and estimated distributions strongly supports the use of the estimated distribution rather than the theoretical one. As noted by Efron [8], these differences can substantially affect any simultaneous inference (including FDR estimation and FWER control). It is worth noting that when the FWER is controlled at 5% with a classical Bonferroni procedure, the p-values for the DDX17 gene calculated from the theoretical distribution yielded 52 significant results, while the p-values calculated from the estimated distribution gave only 13 significant results. In this example, considering the theoretical distribution clearly tended to overestimate the number of significant results.

Bottom Line: Here, we propose a finite mixture model to estimate the local FDR (lFDR), the FDR, and the false non-discovery rate (FNR) in variance-component linkage analysis.Our parametric approach allows empirical estimation of an appropriate distribution.The contribution of our model to estimation of FDR and related criteria is illustrated on the microarray expression profiles data set provided by the Genetic Analysis Workshop 15 Problem 1.

View Article: PubMed Central - HTML - PubMed

Affiliation: JE 2492 Universite Paris-Sud, Hôpital Paul Brousse - Batiment 15/16, 16 Avenue Paul Vaillant Couturier, Villejuif CEDEX 94807, France. dalmasso@vjf.inserm.fr

ABSTRACT
With the availability of very dense genome-wide maps of markers, multiple testing has become a major difficulty for genetic studies. In this context, the false-discovery rate (FDR) and related criteria are widely used. Here, we propose a finite mixture model to estimate the local FDR (lFDR), the FDR, and the false non-discovery rate (FNR) in variance-component linkage analysis. Our parametric approach allows empirical estimation of an appropriate distribution. The contribution of our model to estimation of FDR and related criteria is illustrated on the microarray expression profiles data set provided by the Genetic Analysis Workshop 15 Problem 1.

No MeSH data available.