Limits...
A mixture model approach to multiple testing for the genetic analysis of gene expression.

Dalmasso C, Pickrell J, Tuefferd M, Génin E, Bourgain C, Broët P - BMC Proc (2007)

Bottom Line: Here, we propose a finite mixture model to estimate the local FDR (lFDR), the FDR, and the false non-discovery rate (FNR) in variance-component linkage analysis.Our parametric approach allows empirical estimation of an appropriate distribution.The contribution of our model to estimation of FDR and related criteria is illustrated on the microarray expression profiles data set provided by the Genetic Analysis Workshop 15 Problem 1.

View Article: PubMed Central - HTML - PubMed

Affiliation: JE 2492 Universite Paris-Sud, Hôpital Paul Brousse - Batiment 15/16, 16 Avenue Paul Vaillant Couturier, Villejuif CEDEX 94807, France. dalmasso@vjf.inserm.fr

ABSTRACT
With the availability of very dense genome-wide maps of markers, multiple testing has become a major difficulty for genetic studies. In this context, the false-discovery rate (FDR) and related criteria are widely used. Here, we propose a finite mixture model to estimate the local FDR (lFDR), the FDR, and the false non-discovery rate (FNR) in variance-component linkage analysis. Our parametric approach allows empirical estimation of an appropriate distribution. The contribution of our model to estimation of FDR and related criteria is illustrated on the microarray expression profiles data set provided by the Genetic Analysis Workshop 15 Problem 1.

No MeSH data available.


Estimated posterior probabilities (lFDR) for the 10 selected genes along the 22 chromosomes. Significant results at FDR threshold 0.05 are plotted in red.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2367480&req=5

Figure 2: Estimated posterior probabilities (lFDR) for the 10 selected genes along the 22 chromosomes. Significant results at FDR threshold 0.05 are plotted in red.

Mentions: Summary statistics calculated from the full output of the MCMC algorithm (after discarding the burn-in samples) provide information on the posterior probabilities of belonging to the hypothesis component. Using these estimates, probabilistic classification of the data (in terms of discoveries and non-discoveries) can be obtained concomitantly with the estimations of FDR and FNR [10,11]. Herein, we decided to consider as discoveries (linkage) the markers with posterior probabilities below a threshold value, which can be different for each phenotype and was chosen to ensure 5% FDR. Figure 2 shows the estimated posterior probabilities (equivalent to the lFDR) along the 22 chromosomes for the 10 phenotypes. Meanwhile, the estimated FNR ranged from 23% (PSPHL) to 28% (HOMER1) (data not shown). The selected markers with an lFDR estimate below the defined threshold are plotted in red. These selected markers differed substantially from those obtained by Morley et al. [1]. For example, we found multiple cis-acting and trans-acting regulators for DDX7 and IL16, while Morley et al. [1] found only cis-acting regulators for these genes.


A mixture model approach to multiple testing for the genetic analysis of gene expression.

Dalmasso C, Pickrell J, Tuefferd M, Génin E, Bourgain C, Broët P - BMC Proc (2007)

Estimated posterior probabilities (lFDR) for the 10 selected genes along the 22 chromosomes. Significant results at FDR threshold 0.05 are plotted in red.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2367480&req=5

Figure 2: Estimated posterior probabilities (lFDR) for the 10 selected genes along the 22 chromosomes. Significant results at FDR threshold 0.05 are plotted in red.
Mentions: Summary statistics calculated from the full output of the MCMC algorithm (after discarding the burn-in samples) provide information on the posterior probabilities of belonging to the hypothesis component. Using these estimates, probabilistic classification of the data (in terms of discoveries and non-discoveries) can be obtained concomitantly with the estimations of FDR and FNR [10,11]. Herein, we decided to consider as discoveries (linkage) the markers with posterior probabilities below a threshold value, which can be different for each phenotype and was chosen to ensure 5% FDR. Figure 2 shows the estimated posterior probabilities (equivalent to the lFDR) along the 22 chromosomes for the 10 phenotypes. Meanwhile, the estimated FNR ranged from 23% (PSPHL) to 28% (HOMER1) (data not shown). The selected markers with an lFDR estimate below the defined threshold are plotted in red. These selected markers differed substantially from those obtained by Morley et al. [1]. For example, we found multiple cis-acting and trans-acting regulators for DDX7 and IL16, while Morley et al. [1] found only cis-acting regulators for these genes.

Bottom Line: Here, we propose a finite mixture model to estimate the local FDR (lFDR), the FDR, and the false non-discovery rate (FNR) in variance-component linkage analysis.Our parametric approach allows empirical estimation of an appropriate distribution.The contribution of our model to estimation of FDR and related criteria is illustrated on the microarray expression profiles data set provided by the Genetic Analysis Workshop 15 Problem 1.

View Article: PubMed Central - HTML - PubMed

Affiliation: JE 2492 Universite Paris-Sud, Hôpital Paul Brousse - Batiment 15/16, 16 Avenue Paul Vaillant Couturier, Villejuif CEDEX 94807, France. dalmasso@vjf.inserm.fr

ABSTRACT
With the availability of very dense genome-wide maps of markers, multiple testing has become a major difficulty for genetic studies. In this context, the false-discovery rate (FDR) and related criteria are widely used. Here, we propose a finite mixture model to estimate the local FDR (lFDR), the FDR, and the false non-discovery rate (FNR) in variance-component linkage analysis. Our parametric approach allows empirical estimation of an appropriate distribution. The contribution of our model to estimation of FDR and related criteria is illustrated on the microarray expression profiles data set provided by the Genetic Analysis Workshop 15 Problem 1.

No MeSH data available.