Limits...
Parameter estimation for robust HMM analysis of ChIP-chip data.

Humburg P, Bulger D, Stone G - BMC Bioinformatics (2008)

Bottom Line: Although hidden Markov models have been used successfully to analyse tiling array data, parameter estimation for these models is typically ad hoc.We illustrate an efficient parameter estimation procedure that can be used for HMM based methods in general and leads to a clear increase in performance when compared to the use of ad hoc estimates.The resulting hidden Markov model outperforms established methods like TileMap in the context of histone modification studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics, Macquarie University, North Ryde, NSW 2109, Australia. peter.humburg@csiro.au

ABSTRACT

Background: Tiling arrays are an important tool for the study of transcriptional activity, protein-DNA interactions and chromatin structure on a genome-wide scale at high resolution. Although hidden Markov models have been used successfully to analyse tiling array data, parameter estimation for these models is typically ad hoc. Especially in the context of ChIP-chip experiments, no standard procedures exist to obtain parameter estimates from the data. Common methods for the calculation of maximum likelihood estimates such as the Baum-Welch algorithm or Viterbi training are rarely applied in the context of tiling array analysis.

Results: Here we develop a hidden Markov model for the analysis of chromatin structure ChIP-chip tiling array data, using t emission distributions to increase robustness towards outliers. Maximum likelihood estimates are used for all model parameters. Two different approaches to parameter estimation are investigated and combined into an efficient procedure.

Conclusion: We illustrate an efficient parameter estimation procedure that can be used for HMM based methods in general and leads to a clear increase in performance when compared to the use of ad hoc estimates. The resulting hidden Markov model outperforms established methods like TileMap in the context of histone modification studies.

Show MeSH
Error rate for different models on datasets I and II. Error rate resulting from the different models on dataset I (left) and II (right). When the total number of incorrect probe calls is considered, both parameter estimation procedures outperform TileMap on dataset I for cut-offs larger than 0.2. Both Baum-Welch and Viterbi training provide models with an optimal cut-off close to 0.5, while TileMap significantly underestimates the posterior probability resulting in an optimal cut-off of 0.19. The models with optimised parameters show similar performance on both datasets. On dataset II TileMap's performance is reduced in comparison to the results on dataset I. The main differences between the models considered here occur at error rates of 0–0.08. The relevant area of the figures in the top row is magnified in the plots below.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2536674&req=5

Figure 2: Error rate for different models on datasets I and II. Error rate resulting from the different models on dataset I (left) and II (right). When the total number of incorrect probe calls is considered, both parameter estimation procedures outperform TileMap on dataset I for cut-offs larger than 0.2. Both Baum-Welch and Viterbi training provide models with an optimal cut-off close to 0.5, while TileMap significantly underestimates the posterior probability resulting in an optimal cut-off of 0.19. The models with optimised parameters show similar performance on both datasets. On dataset II TileMap's performance is reduced in comparison to the results on dataset I. The main differences between the models considered here occur at error rates of 0–0.08. The relevant area of the figures in the top row is magnified in the plots below.

Mentions: We now consider the performance of both the Baum-Welch procedure and Viterbi training when all model parameters, including the degrees of freedom ν, are estimated from the data. Both parameter estimation methods are used to fit an HMM to datasets I and II, and the performance of resulting models is assessed in terms of the achieved error rate (Figure 2), ROC curves (Figure 3) and their associated AUC (Table 1) for both datasets. To assess how well these methods perform in comparison to an established algorithm, we also fit a TileMap model to the two simulated datasets. The three models are compared to each other, as well as an ad hoc model which simply uses, without optimisation, the initial parameter estimates used by the two parameter optimisation methods. When comparing the performance of these models on both simulated datasets, it is important to consider that the simulation procedure introduces a bias towards the underlying model.


Parameter estimation for robust HMM analysis of ChIP-chip data.

Humburg P, Bulger D, Stone G - BMC Bioinformatics (2008)

Error rate for different models on datasets I and II. Error rate resulting from the different models on dataset I (left) and II (right). When the total number of incorrect probe calls is considered, both parameter estimation procedures outperform TileMap on dataset I for cut-offs larger than 0.2. Both Baum-Welch and Viterbi training provide models with an optimal cut-off close to 0.5, while TileMap significantly underestimates the posterior probability resulting in an optimal cut-off of 0.19. The models with optimised parameters show similar performance on both datasets. On dataset II TileMap's performance is reduced in comparison to the results on dataset I. The main differences between the models considered here occur at error rates of 0–0.08. The relevant area of the figures in the top row is magnified in the plots below.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2536674&req=5

Figure 2: Error rate for different models on datasets I and II. Error rate resulting from the different models on dataset I (left) and II (right). When the total number of incorrect probe calls is considered, both parameter estimation procedures outperform TileMap on dataset I for cut-offs larger than 0.2. Both Baum-Welch and Viterbi training provide models with an optimal cut-off close to 0.5, while TileMap significantly underestimates the posterior probability resulting in an optimal cut-off of 0.19. The models with optimised parameters show similar performance on both datasets. On dataset II TileMap's performance is reduced in comparison to the results on dataset I. The main differences between the models considered here occur at error rates of 0–0.08. The relevant area of the figures in the top row is magnified in the plots below.
Mentions: We now consider the performance of both the Baum-Welch procedure and Viterbi training when all model parameters, including the degrees of freedom ν, are estimated from the data. Both parameter estimation methods are used to fit an HMM to datasets I and II, and the performance of resulting models is assessed in terms of the achieved error rate (Figure 2), ROC curves (Figure 3) and their associated AUC (Table 1) for both datasets. To assess how well these methods perform in comparison to an established algorithm, we also fit a TileMap model to the two simulated datasets. The three models are compared to each other, as well as an ad hoc model which simply uses, without optimisation, the initial parameter estimates used by the two parameter optimisation methods. When comparing the performance of these models on both simulated datasets, it is important to consider that the simulation procedure introduces a bias towards the underlying model.

Bottom Line: Although hidden Markov models have been used successfully to analyse tiling array data, parameter estimation for these models is typically ad hoc.We illustrate an efficient parameter estimation procedure that can be used for HMM based methods in general and leads to a clear increase in performance when compared to the use of ad hoc estimates.The resulting hidden Markov model outperforms established methods like TileMap in the context of histone modification studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics, Macquarie University, North Ryde, NSW 2109, Australia. peter.humburg@csiro.au

ABSTRACT

Background: Tiling arrays are an important tool for the study of transcriptional activity, protein-DNA interactions and chromatin structure on a genome-wide scale at high resolution. Although hidden Markov models have been used successfully to analyse tiling array data, parameter estimation for these models is typically ad hoc. Especially in the context of ChIP-chip experiments, no standard procedures exist to obtain parameter estimates from the data. Common methods for the calculation of maximum likelihood estimates such as the Baum-Welch algorithm or Viterbi training are rarely applied in the context of tiling array analysis.

Results: Here we develop a hidden Markov model for the analysis of chromatin structure ChIP-chip tiling array data, using t emission distributions to increase robustness towards outliers. Maximum likelihood estimates are used for all model parameters. Two different approaches to parameter estimation are investigated and combined into an efficient procedure.

Conclusion: We illustrate an efficient parameter estimation procedure that can be used for HMM based methods in general and leads to a clear increase in performance when compared to the use of ad hoc estimates. The resulting hidden Markov model outperforms established methods like TileMap in the context of histone modification studies.

Show MeSH