Limits...
Transcriptional landscape estimation from tiling array data using a model of signal shift and drift.

Nicolas P, Leduc A, Robin S, Rasmussen S, Jarmer H, Bessières P - Bioinformatics (2009)

Bottom Line: Importantly, the model is also enriched and accounts for subtle effects such as signal 'drift' and covariates.Relevance of this framework is demonstrated on a Bacillus subtilis dataset.A software is distributed under the GPL.

View Article: PubMed Central - PubMed

Affiliation: INRA, Mathématique Informatique et Génome UR1077, 78350 Jouy-en-Josas, France. pierre.nicolas@jouy.inra.fr

ABSTRACT

Motivation: High-density oligonucleotide tiling array technology holds the promise of a better description of the complexity and the dynamics of transcriptional landscapes. In organisms such as bacteria and yeasts, transcription can be measured on a genome-wide scale with a resolution >25 bp. The statistical models currently used to handle these data remain however very simple, the most popular being the piecewise constant Gaussian model with a fixed number of breakpoints.

Results: This article describes a new methodology based on a hidden Markov model that embeds the segmentation of a continuous-valued signal in a probabilistic setting. For a computationally affordable cost, this framework (i) alleviates the difficulty of choosing a fixed number of breakpoints, and (ii) permits retrieving more information than a unique segmentation by giving access to the whole probability distribution of the transcription profile. Importantly, the model is also enriched and accounts for subtle effects such as signal 'drift' and covariates. Relevance of this framework is demonstrated on a Bacillus subtilis dataset.

Availability: A software is distributed under the GPL.

Show MeSH
Parameter estimates. (A) Transition matrix π(ut, ut+1). One row is represented. (B) SD of the noise σ as a function of the underlying signal level ut. (C) Outlier probability ϵ as a function of the magnitude of the gDNA residuals rt (plain line) and complementary cumulative distribution function of the gDNA residuals (dotted line). (D) Proportionality factor ρ applied to rt as a function of the signal level ut
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2735659&req=5

Figure 2: Parameter estimates. (A) Transition matrix π(ut, ut+1). One row is represented. (B) SD of the noise σ as a function of the underlying signal level ut. (C) Outlier probability ϵ as a function of the magnitude of the gDNA residuals rt (plain line) and complementary cumulative distribution function of the gDNA residuals (dotted line). (D) Proportionality factor ρ applied to rt as a function of the signal level ut

Mentions: Parameter estimates in model-based analyses are an invaluable source of information to understand both the behavior of the model and the data. The model contains a total of 23 parameters. Figure 2 is intended to provide an overview of their ML estimates on the B.subtilis data. The first row of Table 1 gives numerical values for a selection of parameters.Fig. 2.


Transcriptional landscape estimation from tiling array data using a model of signal shift and drift.

Nicolas P, Leduc A, Robin S, Rasmussen S, Jarmer H, Bessières P - Bioinformatics (2009)

Parameter estimates. (A) Transition matrix π(ut, ut+1). One row is represented. (B) SD of the noise σ as a function of the underlying signal level ut. (C) Outlier probability ϵ as a function of the magnitude of the gDNA residuals rt (plain line) and complementary cumulative distribution function of the gDNA residuals (dotted line). (D) Proportionality factor ρ applied to rt as a function of the signal level ut
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2735659&req=5

Figure 2: Parameter estimates. (A) Transition matrix π(ut, ut+1). One row is represented. (B) SD of the noise σ as a function of the underlying signal level ut. (C) Outlier probability ϵ as a function of the magnitude of the gDNA residuals rt (plain line) and complementary cumulative distribution function of the gDNA residuals (dotted line). (D) Proportionality factor ρ applied to rt as a function of the signal level ut
Mentions: Parameter estimates in model-based analyses are an invaluable source of information to understand both the behavior of the model and the data. The model contains a total of 23 parameters. Figure 2 is intended to provide an overview of their ML estimates on the B.subtilis data. The first row of Table 1 gives numerical values for a selection of parameters.Fig. 2.

Bottom Line: Importantly, the model is also enriched and accounts for subtle effects such as signal 'drift' and covariates.Relevance of this framework is demonstrated on a Bacillus subtilis dataset.A software is distributed under the GPL.

View Article: PubMed Central - PubMed

Affiliation: INRA, Mathématique Informatique et Génome UR1077, 78350 Jouy-en-Josas, France. pierre.nicolas@jouy.inra.fr

ABSTRACT

Motivation: High-density oligonucleotide tiling array technology holds the promise of a better description of the complexity and the dynamics of transcriptional landscapes. In organisms such as bacteria and yeasts, transcription can be measured on a genome-wide scale with a resolution >25 bp. The statistical models currently used to handle these data remain however very simple, the most popular being the piecewise constant Gaussian model with a fixed number of breakpoints.

Results: This article describes a new methodology based on a hidden Markov model that embeds the segmentation of a continuous-valued signal in a probabilistic setting. For a computationally affordable cost, this framework (i) alleviates the difficulty of choosing a fixed number of breakpoints, and (ii) permits retrieving more information than a unique segmentation by giving access to the whole probability distribution of the transcription profile. Importantly, the model is also enriched and accounts for subtle effects such as signal 'drift' and covariates. Relevance of this framework is demonstrated on a Bacillus subtilis dataset.

Availability: A software is distributed under the GPL.

Show MeSH