Limits...
MARZ: an algorithm to combinatorially analyze gapped n-mer models of transcription factor binding.

Zellers RG, Drewell RA, Dresch JM - BMC Bioinformatics (2015)

Bottom Line: A number of computational approaches have been developed to examine these interactions, including simple mononucleotide and dinucleotide position weight matrix models.Here we develop a novel, unbiased computational algorithm, MARZ, that systematically analyzes all possible gapped matrices across a fixed number of nucleotides.Our results indicate that in many cases gapped matrix models can outperform traditional models, but that the relative strength of the binding sites considered in the analysis can profoundly influence the predictive ability of specific models.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Harvey Mudd College, 301 Platt Boulevard, Claremont CA, 91711, USA. rzellers@hmc.edu.

ABSTRACT

Background: A key challenge in understanding the molecular mechanisms that control gene regulation is the characterization of the specificity with which transcription factor proteins bind to specific DNA sequences. A number of computational approaches have been developed to examine these interactions, including simple mononucleotide and dinucleotide position weight matrix models.

Results: Here we develop a novel, unbiased computational algorithm, MARZ, that systematically analyzes all possible gapped matrices across a fixed number of nucleotides. In addition, to evaluate the ability of these matrix models to predict in vivo binding sites, we utilize a new scoring system and, in combination with established scoring methods and statistical analysis, test the performance of 32 different gapped matrices on the well characterized HUNCHBACK transcription factor in Drosophila.

Conclusions: Our results indicate that in many cases gapped matrix models can outperform traditional models, but that the relative strength of the binding sites considered in the analysis can profoundly influence the predictive ability of specific models.

Show MeSH

Related in: MedlinePlus

AUROC and RZ score evaluation for all 32 gappedn-mer matrices for HB. This heatmap summarizes the results shown in Figures 3 and 4. The first column lists each of the 32 gapped n-mers. The second column contains the AUROC score obtained from each gapped n-mer’s ROC curve. The third through seventh columns contain the RZ scores obtained from each gapped n-mer at the threshold positions 0.0, 0.25, 0.5, 0.75, and 1.0, respectively. For columns two through seven, the scores are color-coded with green, yellow, and red for high, medium, and low values respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4384306&req=5

Fig6: AUROC and RZ score evaluation for all 32 gappedn-mer matrices for HB. This heatmap summarizes the results shown in Figures 3 and 4. The first column lists each of the 32 gapped n-mers. The second column contains the AUROC score obtained from each gapped n-mer’s ROC curve. The third through seventh columns contain the RZ scores obtained from each gapped n-mer at the threshold positions 0.0, 0.25, 0.5, 0.75, and 1.0, respectively. For columns two through seven, the scores are color-coded with green, yellow, and red for high, medium, and low values respectively.

Mentions: A summary of the AUROC and RZ scores are shown in Figure 6. This figure again emphasizes the fact that many of the different gapped n-mers are outperforming the traditional mono- and dinucleotide matrices, and that this performance is highly dependent on the threshold position used.Figure 6


MARZ: an algorithm to combinatorially analyze gapped n-mer models of transcription factor binding.

Zellers RG, Drewell RA, Dresch JM - BMC Bioinformatics (2015)

AUROC and RZ score evaluation for all 32 gappedn-mer matrices for HB. This heatmap summarizes the results shown in Figures 3 and 4. The first column lists each of the 32 gapped n-mers. The second column contains the AUROC score obtained from each gapped n-mer’s ROC curve. The third through seventh columns contain the RZ scores obtained from each gapped n-mer at the threshold positions 0.0, 0.25, 0.5, 0.75, and 1.0, respectively. For columns two through seven, the scores are color-coded with green, yellow, and red for high, medium, and low values respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4384306&req=5

Fig6: AUROC and RZ score evaluation for all 32 gappedn-mer matrices for HB. This heatmap summarizes the results shown in Figures 3 and 4. The first column lists each of the 32 gapped n-mers. The second column contains the AUROC score obtained from each gapped n-mer’s ROC curve. The third through seventh columns contain the RZ scores obtained from each gapped n-mer at the threshold positions 0.0, 0.25, 0.5, 0.75, and 1.0, respectively. For columns two through seven, the scores are color-coded with green, yellow, and red for high, medium, and low values respectively.
Mentions: A summary of the AUROC and RZ scores are shown in Figure 6. This figure again emphasizes the fact that many of the different gapped n-mers are outperforming the traditional mono- and dinucleotide matrices, and that this performance is highly dependent on the threshold position used.Figure 6

Bottom Line: A number of computational approaches have been developed to examine these interactions, including simple mononucleotide and dinucleotide position weight matrix models.Here we develop a novel, unbiased computational algorithm, MARZ, that systematically analyzes all possible gapped matrices across a fixed number of nucleotides.Our results indicate that in many cases gapped matrix models can outperform traditional models, but that the relative strength of the binding sites considered in the analysis can profoundly influence the predictive ability of specific models.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Harvey Mudd College, 301 Platt Boulevard, Claremont CA, 91711, USA. rzellers@hmc.edu.

ABSTRACT

Background: A key challenge in understanding the molecular mechanisms that control gene regulation is the characterization of the specificity with which transcription factor proteins bind to specific DNA sequences. A number of computational approaches have been developed to examine these interactions, including simple mononucleotide and dinucleotide position weight matrix models.

Results: Here we develop a novel, unbiased computational algorithm, MARZ, that systematically analyzes all possible gapped matrices across a fixed number of nucleotides. In addition, to evaluate the ability of these matrix models to predict in vivo binding sites, we utilize a new scoring system and, in combination with established scoring methods and statistical analysis, test the performance of 32 different gapped matrices on the well characterized HUNCHBACK transcription factor in Drosophila.

Conclusions: Our results indicate that in many cases gapped matrix models can outperform traditional models, but that the relative strength of the binding sites considered in the analysis can profoundly influence the predictive ability of specific models.

Show MeSH
Related in: MedlinePlus