Limits...
A linear model for transcription factor binding affinity prediction in protein binding microarrays.

Annala M, Laurila K, Lähdesmäki H, Nykter M - PLoS ONE (2011)

Bottom Line: Our method was the best performer in the Dialogue for Reverse Engineering Assessments and Methods 5 (DREAM5) transcription factor/DNA motif recognition challenge.For the DREAM5 bonus challenge, we also developed an approach for the identification of transcription factors based on their PBM binding profiles.Our approach for TF identification achieved the best performance in the bonus challenge.

View Article: PubMed Central - PubMed

Affiliation: Department of Signal Processing, Tampere University of Technology, Tampere, Finland. matti.annala@tut.fi

ABSTRACT
Protein binding microarrays (PBM) are a high throughput technology used to characterize protein-DNA binding. The arrays measure a protein's affinity toward thousands of double-stranded DNA sequences at once, producing a comprehensive binding specificity catalog. We present a linear model for predicting the binding affinity of a protein toward DNA sequences based on PBM data. Our model represents the measured intensity of an individual probe as a sum of the binding affinity contributions of the probe's subsequences. These subsequences characterize a DNA binding motif and can be used to predict the intensity of protein binding against arbitrary DNA sequences. Our method was the best performer in the Dialogue for Reverse Engineering Assessments and Methods 5 (DREAM5) transcription factor/DNA motif recognition challenge. For the DREAM5 bonus challenge, we also developed an approach for the identification of transcription factors based on their PBM binding profiles. Our approach for TF identification achieved the best performance in the bonus challenge.

Show MeSH
Agreement between top affinity K-mers and JASPAR sequence logos.Shown at the top of the figure are JASPAR Core sequence logos for four TFs. Visible below the sequence logos are the top five highest affinity K-mers from the linear model, for all four TFs. An arrow and the characters “RC” indicate reverse complement K-mers. All sequence logos are for Mus musculus, and were downloaded from JASPAR.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3102690&req=5

pone-0020059-g011: Agreement between top affinity K-mers and JASPAR sequence logos.Shown at the top of the figure are JASPAR Core sequence logos for four TFs. Visible below the sequence logos are the top five highest affinity K-mers from the linear model, for all four TFs. An arrow and the characters “RC” indicate reverse complement K-mers. All sequence logos are for Mus musculus, and were downloaded from JASPAR.

Mentions: Since our model associates each K-mer with a TF specific binding affinity, we can better understand the binding specificity of a TF by studying its top affinity K-mers. These highest affinity K-mers can then be contrasted with K-mers selected according to median probe intensity. We performed this comparison, and noted that the top median intensity K-mer lists mostly contained 8-mers and hardly any shorter K-mers. We also observed a disproportionally high number of 8-mers containing guanine or cytosine repeats among the top median intensity K-mers. In contrast, among the top affinity K-mers we saw many short K-mers, and less enrichment for the G/C repeats. The top affinity K-mers were also in excellent agreement with the TF binding motifs found in JASPAR Core, even for gapped motifs (Figure 11). The 20 highest linear affinity K-mers for all 86 PBM samples are available in supplementary Table S3. The highest median intensity K-mers can be found in supplementary Table S4.


A linear model for transcription factor binding affinity prediction in protein binding microarrays.

Annala M, Laurila K, Lähdesmäki H, Nykter M - PLoS ONE (2011)

Agreement between top affinity K-mers and JASPAR sequence logos.Shown at the top of the figure are JASPAR Core sequence logos for four TFs. Visible below the sequence logos are the top five highest affinity K-mers from the linear model, for all four TFs. An arrow and the characters “RC” indicate reverse complement K-mers. All sequence logos are for Mus musculus, and were downloaded from JASPAR.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3102690&req=5

pone-0020059-g011: Agreement between top affinity K-mers and JASPAR sequence logos.Shown at the top of the figure are JASPAR Core sequence logos for four TFs. Visible below the sequence logos are the top five highest affinity K-mers from the linear model, for all four TFs. An arrow and the characters “RC” indicate reverse complement K-mers. All sequence logos are for Mus musculus, and were downloaded from JASPAR.
Mentions: Since our model associates each K-mer with a TF specific binding affinity, we can better understand the binding specificity of a TF by studying its top affinity K-mers. These highest affinity K-mers can then be contrasted with K-mers selected according to median probe intensity. We performed this comparison, and noted that the top median intensity K-mer lists mostly contained 8-mers and hardly any shorter K-mers. We also observed a disproportionally high number of 8-mers containing guanine or cytosine repeats among the top median intensity K-mers. In contrast, among the top affinity K-mers we saw many short K-mers, and less enrichment for the G/C repeats. The top affinity K-mers were also in excellent agreement with the TF binding motifs found in JASPAR Core, even for gapped motifs (Figure 11). The 20 highest linear affinity K-mers for all 86 PBM samples are available in supplementary Table S3. The highest median intensity K-mers can be found in supplementary Table S4.

Bottom Line: Our method was the best performer in the Dialogue for Reverse Engineering Assessments and Methods 5 (DREAM5) transcription factor/DNA motif recognition challenge.For the DREAM5 bonus challenge, we also developed an approach for the identification of transcription factors based on their PBM binding profiles.Our approach for TF identification achieved the best performance in the bonus challenge.

View Article: PubMed Central - PubMed

Affiliation: Department of Signal Processing, Tampere University of Technology, Tampere, Finland. matti.annala@tut.fi

ABSTRACT
Protein binding microarrays (PBM) are a high throughput technology used to characterize protein-DNA binding. The arrays measure a protein's affinity toward thousands of double-stranded DNA sequences at once, producing a comprehensive binding specificity catalog. We present a linear model for predicting the binding affinity of a protein toward DNA sequences based on PBM data. Our model represents the measured intensity of an individual probe as a sum of the binding affinity contributions of the probe's subsequences. These subsequences characterize a DNA binding motif and can be used to predict the intensity of protein binding against arbitrary DNA sequences. Our method was the best performer in the Dialogue for Reverse Engineering Assessments and Methods 5 (DREAM5) transcription factor/DNA motif recognition challenge. For the DREAM5 bonus challenge, we also developed an approach for the identification of transcription factors based on their PBM binding profiles. Our approach for TF identification achieved the best performance in the bonus challenge.

Show MeSH