Limits...
MARZ: an algorithm to combinatorially analyze gapped n-mer models of transcription factor binding.

Zellers RG, Drewell RA, Dresch JM - BMC Bioinformatics (2015)

Bottom Line: A key challenge in understanding the molecular mechanisms that control gene regulation is the characterization of the specificity with which transcription factor proteins bind to specific DNA sequences.A number of computational approaches have been developed to examine these interactions, including simple mononucleotide and dinucleotide position weight matrix models.Our results indicate that in many cases gapped matrix models can outperform traditional models, but that the relative strength of the binding sites considered in the analysis can profoundly influence the predictive ability of specific models.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Harvey Mudd College, 301 Platt Boulevard, Claremont CA, 91711, USA. rzellers@hmc.edu.

ABSTRACT

Background: A key challenge in understanding the molecular mechanisms that control gene regulation is the characterization of the specificity with which transcription factor proteins bind to specific DNA sequences. A number of computational approaches have been developed to examine these interactions, including simple mononucleotide and dinucleotide position weight matrix models.

Results: Here we develop a novel, unbiased computational algorithm, MARZ, that systematically analyzes all possible gapped matrices across a fixed number of nucleotides. In addition, to evaluate the ability of these matrix models to predict in vivo binding sites, we utilize a new scoring system and, in combination with established scoring methods and statistical analysis, test the performance of 32 different gapped matrices on the well characterized HUNCHBACK transcription factor in Drosophila.

Conclusions: Our results indicate that in many cases gapped matrix models can outperform traditional models, but that the relative strength of the binding sites considered in the analysis can profoundly influence the predictive ability of specific models.

Show MeSH

Related in: MedlinePlus

Comparison of the performance of all gappedn-mer matrices to the traditionalm matrix for HB.(A and C) Chi-square values with significance color-coded: Green (p<0.01), Aqua (p<0.05). See Materials and Methods for a description of the Chi-square analysis. (B and D) Pearson correlation distance from mononucleotide matrix. See Materials and Methods for a description of the calculation. Panels A and B are obtained using a threshold of 0.0, and C and D using a threshold of 1.0.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4384306&req=5

Fig7: Comparison of the performance of all gappedn-mer matrices to the traditionalm matrix for HB.(A and C) Chi-square values with significance color-coded: Green (p<0.01), Aqua (p<0.05). See Materials and Methods for a description of the Chi-square analysis. (B and D) Pearson correlation distance from mononucleotide matrix. See Materials and Methods for a description of the calculation. Panels A and B are obtained using a threshold of 0.0, and C and D using a threshold of 1.0.

Mentions: To quantify the significance of the performance difference between each matrix and the traditional mononucleotide matrix m, we analyze Chi-square and Pearson correlation coefficient values (Figure 7 and Additional file 1: Figure S2). For the Chi-square analysis, we consider how frequently a particular matrix can identify a predicted HB binding site in a ‘real’ ChIP peak relative to ‘scrambled’ ChIP peaks (see Materials and Methods for details). This analysis does not account for whether the results are obtained on the same individual ChIP peaks. To address this issue, we also calculate the Pearson correlation coefficient to investigate at single nucleotide resolution the correlation of the predicted binding sites within each ChIP peak relative to binding sites predicted using the mononucleotide matrix m.Figure 7


MARZ: an algorithm to combinatorially analyze gapped n-mer models of transcription factor binding.

Zellers RG, Drewell RA, Dresch JM - BMC Bioinformatics (2015)

Comparison of the performance of all gappedn-mer matrices to the traditionalm matrix for HB.(A and C) Chi-square values with significance color-coded: Green (p<0.01), Aqua (p<0.05). See Materials and Methods for a description of the Chi-square analysis. (B and D) Pearson correlation distance from mononucleotide matrix. See Materials and Methods for a description of the calculation. Panels A and B are obtained using a threshold of 0.0, and C and D using a threshold of 1.0.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4384306&req=5

Fig7: Comparison of the performance of all gappedn-mer matrices to the traditionalm matrix for HB.(A and C) Chi-square values with significance color-coded: Green (p<0.01), Aqua (p<0.05). See Materials and Methods for a description of the Chi-square analysis. (B and D) Pearson correlation distance from mononucleotide matrix. See Materials and Methods for a description of the calculation. Panels A and B are obtained using a threshold of 0.0, and C and D using a threshold of 1.0.
Mentions: To quantify the significance of the performance difference between each matrix and the traditional mononucleotide matrix m, we analyze Chi-square and Pearson correlation coefficient values (Figure 7 and Additional file 1: Figure S2). For the Chi-square analysis, we consider how frequently a particular matrix can identify a predicted HB binding site in a ‘real’ ChIP peak relative to ‘scrambled’ ChIP peaks (see Materials and Methods for details). This analysis does not account for whether the results are obtained on the same individual ChIP peaks. To address this issue, we also calculate the Pearson correlation coefficient to investigate at single nucleotide resolution the correlation of the predicted binding sites within each ChIP peak relative to binding sites predicted using the mononucleotide matrix m.Figure 7

Bottom Line: A key challenge in understanding the molecular mechanisms that control gene regulation is the characterization of the specificity with which transcription factor proteins bind to specific DNA sequences.A number of computational approaches have been developed to examine these interactions, including simple mononucleotide and dinucleotide position weight matrix models.Our results indicate that in many cases gapped matrix models can outperform traditional models, but that the relative strength of the binding sites considered in the analysis can profoundly influence the predictive ability of specific models.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Harvey Mudd College, 301 Platt Boulevard, Claremont CA, 91711, USA. rzellers@hmc.edu.

ABSTRACT

Background: A key challenge in understanding the molecular mechanisms that control gene regulation is the characterization of the specificity with which transcription factor proteins bind to specific DNA sequences. A number of computational approaches have been developed to examine these interactions, including simple mononucleotide and dinucleotide position weight matrix models.

Results: Here we develop a novel, unbiased computational algorithm, MARZ, that systematically analyzes all possible gapped matrices across a fixed number of nucleotides. In addition, to evaluate the ability of these matrix models to predict in vivo binding sites, we utilize a new scoring system and, in combination with established scoring methods and statistical analysis, test the performance of 32 different gapped matrices on the well characterized HUNCHBACK transcription factor in Drosophila.

Conclusions: Our results indicate that in many cases gapped matrix models can outperform traditional models, but that the relative strength of the binding sites considered in the analysis can profoundly influence the predictive ability of specific models.

Show MeSH
Related in: MedlinePlus