Limits...
Transcriptional landscape estimation from tiling array data using a model of signal shift and drift.

Nicolas P, Leduc A, Robin S, Rasmussen S, Jarmer H, Bessières P - Bioinformatics (2009)

Bottom Line: Importantly, the model is also enriched and accounts for subtle effects such as signal 'drift' and covariates.Relevance of this framework is demonstrated on a Bacillus subtilis dataset.A software is distributed under the GPL.

View Article: PubMed Central - PubMed

Affiliation: INRA, Mathématique Informatique et Génome UR1077, 78350 Jouy-en-Josas, France. pierre.nicolas@jouy.inra.fr

ABSTRACT

Motivation: High-density oligonucleotide tiling array technology holds the promise of a better description of the complexity and the dynamics of transcriptional landscapes. In organisms such as bacteria and yeasts, transcription can be measured on a genome-wide scale with a resolution >25 bp. The statistical models currently used to handle these data remain however very simple, the most popular being the piecewise constant Gaussian model with a fixed number of breakpoints.

Results: This article describes a new methodology based on a hidden Markov model that embeds the segmentation of a continuous-valued signal in a probabilistic setting. For a computationally affordable cost, this framework (i) alleviates the difficulty of choosing a fixed number of breakpoints, and (ii) permits retrieving more information than a unique segmentation by giving access to the whole probability distribution of the transcription profile. Importantly, the model is also enriched and accounts for subtle effects such as signal 'drift' and covariates. Relevance of this framework is demonstrated on a Bacillus subtilis dataset.

Availability: A software is distributed under the GPL.

Show MeSH
Transcriptional landscape inference. Analysis of the signal on the (+)-strand of a 10000 bp segment of the B.subtilis chromosome. Upper part: open circles show the original signal. Closed gray circles represent the signal after ‘correction’ with the gDNA covariate. The thick black line shows the expectation of the transcript level as computed with the HMM. Thin black lines correspond to the 95% CI. Middle part: horizontal arrows indicate GenBank CDSs. Lower part: shift moves along the most likely trajectory are shown as squares. Upward and downward drift moves are indicated by point-up and point-down triangles, respectively. Move probabilities are represented as gray lines.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2735659&req=5

Figure 3: Transcriptional landscape inference. Analysis of the signal on the (+)-strand of a 10000 bp segment of the B.subtilis chromosome. Upper part: open circles show the original signal. Closed gray circles represent the signal after ‘correction’ with the gDNA covariate. The thick black line shows the expectation of the transcript level as computed with the HMM. Thin black lines correspond to the 95% CI. Middle part: horizontal arrows indicate GenBank CDSs. Lower part: shift moves along the most likely trajectory are shown as squares. Upward and downward drift moves are indicated by point-up and point-down triangles, respectively. Move probabilities are represented as gray lines.

Mentions: The adoption of a probabilistic setting for the trajectory of the underlying signal allows for a considerably richer signal reconstruction than just ‘optimal’ trajectory reconstruction. Figure 3 gives an illustration of these possibilities by superimposing a number of results obtained with the model on a 10 000 bp region of the B.subtilis chromosome. Results include: (i) the prediction interval for the value of the signal ut at each chromosome position; (ii) a point prediction for the signal value by the conditional mean of ut (the best predictor in terms of quadratic error); (iii) the inferred position of the experimental point after correction for differential probe affinity [computed as ]; (iv) the exact position of each type of move in the best trajectory given by the Viterbi path (abrupt shift, upward drift and downward drift); and (v) the probability of having each type of move at each position. All these values can be read directly from the output of our software.Fig. 3.


Transcriptional landscape estimation from tiling array data using a model of signal shift and drift.

Nicolas P, Leduc A, Robin S, Rasmussen S, Jarmer H, Bessières P - Bioinformatics (2009)

Transcriptional landscape inference. Analysis of the signal on the (+)-strand of a 10000 bp segment of the B.subtilis chromosome. Upper part: open circles show the original signal. Closed gray circles represent the signal after ‘correction’ with the gDNA covariate. The thick black line shows the expectation of the transcript level as computed with the HMM. Thin black lines correspond to the 95% CI. Middle part: horizontal arrows indicate GenBank CDSs. Lower part: shift moves along the most likely trajectory are shown as squares. Upward and downward drift moves are indicated by point-up and point-down triangles, respectively. Move probabilities are represented as gray lines.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2735659&req=5

Figure 3: Transcriptional landscape inference. Analysis of the signal on the (+)-strand of a 10000 bp segment of the B.subtilis chromosome. Upper part: open circles show the original signal. Closed gray circles represent the signal after ‘correction’ with the gDNA covariate. The thick black line shows the expectation of the transcript level as computed with the HMM. Thin black lines correspond to the 95% CI. Middle part: horizontal arrows indicate GenBank CDSs. Lower part: shift moves along the most likely trajectory are shown as squares. Upward and downward drift moves are indicated by point-up and point-down triangles, respectively. Move probabilities are represented as gray lines.
Mentions: The adoption of a probabilistic setting for the trajectory of the underlying signal allows for a considerably richer signal reconstruction than just ‘optimal’ trajectory reconstruction. Figure 3 gives an illustration of these possibilities by superimposing a number of results obtained with the model on a 10 000 bp region of the B.subtilis chromosome. Results include: (i) the prediction interval for the value of the signal ut at each chromosome position; (ii) a point prediction for the signal value by the conditional mean of ut (the best predictor in terms of quadratic error); (iii) the inferred position of the experimental point after correction for differential probe affinity [computed as ]; (iv) the exact position of each type of move in the best trajectory given by the Viterbi path (abrupt shift, upward drift and downward drift); and (v) the probability of having each type of move at each position. All these values can be read directly from the output of our software.Fig. 3.

Bottom Line: Importantly, the model is also enriched and accounts for subtle effects such as signal 'drift' and covariates.Relevance of this framework is demonstrated on a Bacillus subtilis dataset.A software is distributed under the GPL.

View Article: PubMed Central - PubMed

Affiliation: INRA, Mathématique Informatique et Génome UR1077, 78350 Jouy-en-Josas, France. pierre.nicolas@jouy.inra.fr

ABSTRACT

Motivation: High-density oligonucleotide tiling array technology holds the promise of a better description of the complexity and the dynamics of transcriptional landscapes. In organisms such as bacteria and yeasts, transcription can be measured on a genome-wide scale with a resolution >25 bp. The statistical models currently used to handle these data remain however very simple, the most popular being the piecewise constant Gaussian model with a fixed number of breakpoints.

Results: This article describes a new methodology based on a hidden Markov model that embeds the segmentation of a continuous-valued signal in a probabilistic setting. For a computationally affordable cost, this framework (i) alleviates the difficulty of choosing a fixed number of breakpoints, and (ii) permits retrieving more information than a unique segmentation by giving access to the whole probability distribution of the transcription profile. Importantly, the model is also enriched and accounts for subtle effects such as signal 'drift' and covariates. Relevance of this framework is demonstrated on a Bacillus subtilis dataset.

Availability: A software is distributed under the GPL.

Show MeSH