Limits...
Modeling gene expression measurement error: a quasi-likelihood approach.

Strimmer K - BMC Bioinformatics (2003)

Bottom Line: It can also be employed in regression approaches to model systematic (e.g. array or dye) effects.The quasi-likelihood framework provides a simple and versatile approach to analyze gene expression data that does not make any strong distributional assumptions about the underlying error model.In an example it also improved the power of tests to identify differential expression.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics, University of Munich, Ludwigstrasse 33, D-80539 Munich, Germany. strimmer@stat.uni-muenchen.de

ABSTRACT

Background: Using suitable error models for gene expression measurements is essential in the statistical analysis of microarray data. However, the true probabilistic model underlying gene expression intensity readings is generally not known. Instead, in currently used approaches some simple parametric model is assumed (usually a transformed normal distribution) or the empirical distribution is estimated. However, both these strategies may not be optimal for gene expression data, as the non-parametric approach ignores known structural information whereas the fully parametric models run the risk of misspecification. A further related problem is the choice of a suitable scale for the model (e.g. observed vs. log-scale).

Results: Here a simple semi-parametric model for gene expression measurement error is presented. In this approach inference is based an approximate likelihood function (the extended quasi-likelihood). Only partial knowledge about the unknown true distribution is required to construct this function. In case of gene expression this information is available in the form of the postulated (e.g. quadratic) variance structure of the data. As the quasi-likelihood behaves (almost) like a proper likelihood, it allows for the estimation of calibration and variance parameters, and it is also straightforward to obtain corresponding approximate confidence intervals. Unlike most other frameworks, it also allows analysis on any preferred scale, i.e. both on the original linear scale as well as on a transformed scale. It can also be employed in regression approaches to model systematic (e.g. array or dye) effects.

Conclusions: The quasi-likelihood framework provides a simple and versatile approach to analyze gene expression data that does not make any strong distributional assumptions about the underlying error model. For several simulated as well as real data sets it provides a better fit to the data than competing models. In an example it also improved the power of tests to identify differential expression.

Show MeSH

Related in: MedlinePlus

Variance-mean relationship for simulated data: true value (red), maximum-likelihood estimate β (green), maximum-quasi-likelihood estimate (blue)
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC153502&req=5

Figure 1: Variance-mean relationship for simulated data: true value (red), maximum-likelihood estimate β (green), maximum-quasi-likelihood estimate (blue)

Mentions: The simulated whole-chip data consisted of 7000 genes, with the coefficient of variation set to σ = 0.25, the background parameters set to β = 25000 and ρ = 5000. The 7000 true expression levels μi - β were drawn randomly from a log-normal distribution (with log-mean 8 and standard deviation 2). These values were chosen to match the molecular data analyzed in [1]. Data y from the convolution model and the Gamma model were generated directly on the observed scale. To generate data y from the asinh-normal distribution, data x were drawn from a normal distribution N(u, s2) and subsequently transformed to the observed scale via y = asinh(a + bx). Note that this is possible because there exists a one-to-one mapping of the parameters on the normal scale (u, s2, a, b) and those of the transformed scale (μ, σ, β, ρ), see Table 2. For each variant 4, 10 and 20 replicates per gene were drawn. Figure 1 shows the observed mean-variance relationship of an example with 10 replicates per gene.


Modeling gene expression measurement error: a quasi-likelihood approach.

Strimmer K - BMC Bioinformatics (2003)

Variance-mean relationship for simulated data: true value (red), maximum-likelihood estimate β (green), maximum-quasi-likelihood estimate (blue)
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC153502&req=5

Figure 1: Variance-mean relationship for simulated data: true value (red), maximum-likelihood estimate β (green), maximum-quasi-likelihood estimate (blue)
Mentions: The simulated whole-chip data consisted of 7000 genes, with the coefficient of variation set to σ = 0.25, the background parameters set to β = 25000 and ρ = 5000. The 7000 true expression levels μi - β were drawn randomly from a log-normal distribution (with log-mean 8 and standard deviation 2). These values were chosen to match the molecular data analyzed in [1]. Data y from the convolution model and the Gamma model were generated directly on the observed scale. To generate data y from the asinh-normal distribution, data x were drawn from a normal distribution N(u, s2) and subsequently transformed to the observed scale via y = asinh(a + bx). Note that this is possible because there exists a one-to-one mapping of the parameters on the normal scale (u, s2, a, b) and those of the transformed scale (μ, σ, β, ρ), see Table 2. For each variant 4, 10 and 20 replicates per gene were drawn. Figure 1 shows the observed mean-variance relationship of an example with 10 replicates per gene.

Bottom Line: It can also be employed in regression approaches to model systematic (e.g. array or dye) effects.The quasi-likelihood framework provides a simple and versatile approach to analyze gene expression data that does not make any strong distributional assumptions about the underlying error model.In an example it also improved the power of tests to identify differential expression.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics, University of Munich, Ludwigstrasse 33, D-80539 Munich, Germany. strimmer@stat.uni-muenchen.de

ABSTRACT

Background: Using suitable error models for gene expression measurements is essential in the statistical analysis of microarray data. However, the true probabilistic model underlying gene expression intensity readings is generally not known. Instead, in currently used approaches some simple parametric model is assumed (usually a transformed normal distribution) or the empirical distribution is estimated. However, both these strategies may not be optimal for gene expression data, as the non-parametric approach ignores known structural information whereas the fully parametric models run the risk of misspecification. A further related problem is the choice of a suitable scale for the model (e.g. observed vs. log-scale).

Results: Here a simple semi-parametric model for gene expression measurement error is presented. In this approach inference is based an approximate likelihood function (the extended quasi-likelihood). Only partial knowledge about the unknown true distribution is required to construct this function. In case of gene expression this information is available in the form of the postulated (e.g. quadratic) variance structure of the data. As the quasi-likelihood behaves (almost) like a proper likelihood, it allows for the estimation of calibration and variance parameters, and it is also straightforward to obtain corresponding approximate confidence intervals. Unlike most other frameworks, it also allows analysis on any preferred scale, i.e. both on the original linear scale as well as on a transformed scale. It can also be employed in regression approaches to model systematic (e.g. array or dye) effects.

Conclusions: The quasi-likelihood framework provides a simple and versatile approach to analyze gene expression data that does not make any strong distributional assumptions about the underlying error model. For several simulated as well as real data sets it provides a better fit to the data than competing models. In an example it also improved the power of tests to identify differential expression.

Show MeSH
Related in: MedlinePlus