Limits...
Parameter estimation for robust HMM analysis of ChIP-chip data.

Humburg P, Bulger D, Stone G - BMC Bioinformatics (2008)

Bottom Line: Although hidden Markov models have been used successfully to analyse tiling array data, parameter estimation for these models is typically ad hoc.We illustrate an efficient parameter estimation procedure that can be used for HMM based methods in general and leads to a clear increase in performance when compared to the use of ad hoc estimates.The resulting hidden Markov model outperforms established methods like TileMap in the context of histone modification studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics, Macquarie University, North Ryde, NSW 2109, Australia. peter.humburg@csiro.au

ABSTRACT

Background: Tiling arrays are an important tool for the study of transcriptional activity, protein-DNA interactions and chromatin structure on a genome-wide scale at high resolution. Although hidden Markov models have been used successfully to analyse tiling array data, parameter estimation for these models is typically ad hoc. Especially in the context of ChIP-chip experiments, no standard procedures exist to obtain parameter estimates from the data. Common methods for the calculation of maximum likelihood estimates such as the Baum-Welch algorithm or Viterbi training are rarely applied in the context of tiling array analysis.

Results: Here we develop a hidden Markov model for the analysis of chromatin structure ChIP-chip tiling array data, using t emission distributions to increase robustness towards outliers. Maximum likelihood estimates are used for all model parameters. Two different approaches to parameter estimation are investigated and combined into an efficient procedure.

Conclusion: We illustrate an efficient parameter estimation procedure that can be used for HMM based methods in general and leads to a clear increase in performance when compared to the use of ad hoc estimates. The resulting hidden Markov model outperforms established methods like TileMap in the context of histone modification studies.

Show MeSH
Model performance for different choices of ν. The Baum-Welch model (red) performs better for relatively small values of ν while Viterbi training (blue) favours larger ν. For the optimal choice of ν the Baum-Welch parameter estimates lead to an optimal cut-off close to 0.5.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2536674&req=5

Figure 4: Model performance for different choices of ν. The Baum-Welch model (red) performs better for relatively small values of ν while Viterbi training (blue) favours larger ν. For the optimal choice of ν the Baum-Welch parameter estimates lead to an optimal cut-off close to 0.5.

Mentions: Estimating ν, the degrees of freedom, for t distributions from the data is time-consuming and may not be very accurate, especially for relatively large values of ν. In this section we investigate the effect of fixing ν a priori for both states of the model. Only the case ν1 = ν2 is considered here. The remaining parameters are estimated from the training data using the Baum-Welch algorithm and Viterbi training with ν = 3, 4, ..., 50. For each value of ν, we report the error rate (Figure 4) as well as the AUC (Figure 5) on the simulated data.


Parameter estimation for robust HMM analysis of ChIP-chip data.

Humburg P, Bulger D, Stone G - BMC Bioinformatics (2008)

Model performance for different choices of ν. The Baum-Welch model (red) performs better for relatively small values of ν while Viterbi training (blue) favours larger ν. For the optimal choice of ν the Baum-Welch parameter estimates lead to an optimal cut-off close to 0.5.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2536674&req=5

Figure 4: Model performance for different choices of ν. The Baum-Welch model (red) performs better for relatively small values of ν while Viterbi training (blue) favours larger ν. For the optimal choice of ν the Baum-Welch parameter estimates lead to an optimal cut-off close to 0.5.
Mentions: Estimating ν, the degrees of freedom, for t distributions from the data is time-consuming and may not be very accurate, especially for relatively large values of ν. In this section we investigate the effect of fixing ν a priori for both states of the model. Only the case ν1 = ν2 is considered here. The remaining parameters are estimated from the training data using the Baum-Welch algorithm and Viterbi training with ν = 3, 4, ..., 50. For each value of ν, we report the error rate (Figure 4) as well as the AUC (Figure 5) on the simulated data.

Bottom Line: Although hidden Markov models have been used successfully to analyse tiling array data, parameter estimation for these models is typically ad hoc.We illustrate an efficient parameter estimation procedure that can be used for HMM based methods in general and leads to a clear increase in performance when compared to the use of ad hoc estimates.The resulting hidden Markov model outperforms established methods like TileMap in the context of histone modification studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Statistics, Macquarie University, North Ryde, NSW 2109, Australia. peter.humburg@csiro.au

ABSTRACT

Background: Tiling arrays are an important tool for the study of transcriptional activity, protein-DNA interactions and chromatin structure on a genome-wide scale at high resolution. Although hidden Markov models have been used successfully to analyse tiling array data, parameter estimation for these models is typically ad hoc. Especially in the context of ChIP-chip experiments, no standard procedures exist to obtain parameter estimates from the data. Common methods for the calculation of maximum likelihood estimates such as the Baum-Welch algorithm or Viterbi training are rarely applied in the context of tiling array analysis.

Results: Here we develop a hidden Markov model for the analysis of chromatin structure ChIP-chip tiling array data, using t emission distributions to increase robustness towards outliers. Maximum likelihood estimates are used for all model parameters. Two different approaches to parameter estimation are investigated and combined into an efficient procedure.

Conclusion: We illustrate an efficient parameter estimation procedure that can be used for HMM based methods in general and leads to a clear increase in performance when compared to the use of ad hoc estimates. The resulting hidden Markov model outperforms established methods like TileMap in the context of histone modification studies.

Show MeSH