Limits...
A linear model for transcription factor binding affinity prediction in protein binding microarrays.

Annala M, Laurila K, Lähdesmäki H, Nykter M - PLoS ONE (2011)

Bottom Line: Our method was the best performer in the Dialogue for Reverse Engineering Assessments and Methods 5 (DREAM5) transcription factor/DNA motif recognition challenge.For the DREAM5 bonus challenge, we also developed an approach for the identification of transcription factors based on their PBM binding profiles.Our approach for TF identification achieved the best performance in the bonus challenge.

View Article: PubMed Central - PubMed

Affiliation: Department of Signal Processing, Tampere University of Technology, Tampere, Finland. matti.annala@tut.fi

ABSTRACT
Protein binding microarrays (PBM) are a high throughput technology used to characterize protein-DNA binding. The arrays measure a protein's affinity toward thousands of double-stranded DNA sequences at once, producing a comprehensive binding specificity catalog. We present a linear model for predicting the binding affinity of a protein toward DNA sequences based on PBM data. Our model represents the measured intensity of an individual probe as a sum of the binding affinity contributions of the probe's subsequences. These subsequences characterize a DNA binding motif and can be used to predict the intensity of protein binding against arbitrary DNA sequences. Our method was the best performer in the Dialogue for Reverse Engineering Assessments and Methods 5 (DREAM5) transcription factor/DNA motif recognition challenge. For the DREAM5 bonus challenge, we also developed an approach for the identification of transcription factors based on their PBM binding profiles. Our approach for TF identification achieved the best performance in the bonus challenge.

Show MeSH
Differentiating between TFs from the same family.At the top of the figure are shown the MEME-predicted sequence logos for Pou1f1 and Pou2f1. Below are shown the binding site consensus sequences from literature [39], [40].
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3102690&req=5

pone-0020059-g012: Differentiating between TFs from the same family.At the top of the figure are shown the MEME-predicted sequence logos for Pou1f1 and Pou2f1. Below are shown the binding site consensus sequences from literature [39], [40].

Mentions: The bonus round of the DREAM5 challenge involved identifying the unnamed transcription factors hybridized to the test PBM arrays. To achieve this, we ran the motif discovery tool MEME and compared the discovered motifs to known mammalian TF motifs in TRANSFAC and JASPAR. However, motif databases contain only a few motifs for each TF family and thus the exact TF name cannot be reliably identified. Thus, if the predicted TF names according to Tomtom were the same for several TFs, we used literature to distinguish the TFs. For example, TFs #13 and #51 in the DREAM5 dataset were both predicted to belong to the POU family of transcription factors. However, Pou2f1 is known to bind to consensus sequence 5′-ATGCAAAT-3′ [39] while Pou1f1 favors the consensus sequence 5′-TATNCAT-3′ [40] (see Figure 12). By comparing the conserved motifs to the determined MEME motifs, we were able to correctly identify the TFs. Using our approach we were able to correctly identify seven TFs out of the 66, a result which earned us the first place in the bonus round. Additionally, 15 TFs were identified within the correct TF family. To sum up, even though the computational recognition of TFs is a difficult problem in general, our example demonstrates that it is possible to distinguish TFs within the same family using sequence data.


A linear model for transcription factor binding affinity prediction in protein binding microarrays.

Annala M, Laurila K, Lähdesmäki H, Nykter M - PLoS ONE (2011)

Differentiating between TFs from the same family.At the top of the figure are shown the MEME-predicted sequence logos for Pou1f1 and Pou2f1. Below are shown the binding site consensus sequences from literature [39], [40].
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3102690&req=5

pone-0020059-g012: Differentiating between TFs from the same family.At the top of the figure are shown the MEME-predicted sequence logos for Pou1f1 and Pou2f1. Below are shown the binding site consensus sequences from literature [39], [40].
Mentions: The bonus round of the DREAM5 challenge involved identifying the unnamed transcription factors hybridized to the test PBM arrays. To achieve this, we ran the motif discovery tool MEME and compared the discovered motifs to known mammalian TF motifs in TRANSFAC and JASPAR. However, motif databases contain only a few motifs for each TF family and thus the exact TF name cannot be reliably identified. Thus, if the predicted TF names according to Tomtom were the same for several TFs, we used literature to distinguish the TFs. For example, TFs #13 and #51 in the DREAM5 dataset were both predicted to belong to the POU family of transcription factors. However, Pou2f1 is known to bind to consensus sequence 5′-ATGCAAAT-3′ [39] while Pou1f1 favors the consensus sequence 5′-TATNCAT-3′ [40] (see Figure 12). By comparing the conserved motifs to the determined MEME motifs, we were able to correctly identify the TFs. Using our approach we were able to correctly identify seven TFs out of the 66, a result which earned us the first place in the bonus round. Additionally, 15 TFs were identified within the correct TF family. To sum up, even though the computational recognition of TFs is a difficult problem in general, our example demonstrates that it is possible to distinguish TFs within the same family using sequence data.

Bottom Line: Our method was the best performer in the Dialogue for Reverse Engineering Assessments and Methods 5 (DREAM5) transcription factor/DNA motif recognition challenge.For the DREAM5 bonus challenge, we also developed an approach for the identification of transcription factors based on their PBM binding profiles.Our approach for TF identification achieved the best performance in the bonus challenge.

View Article: PubMed Central - PubMed

Affiliation: Department of Signal Processing, Tampere University of Technology, Tampere, Finland. matti.annala@tut.fi

ABSTRACT
Protein binding microarrays (PBM) are a high throughput technology used to characterize protein-DNA binding. The arrays measure a protein's affinity toward thousands of double-stranded DNA sequences at once, producing a comprehensive binding specificity catalog. We present a linear model for predicting the binding affinity of a protein toward DNA sequences based on PBM data. Our model represents the measured intensity of an individual probe as a sum of the binding affinity contributions of the probe's subsequences. These subsequences characterize a DNA binding motif and can be used to predict the intensity of protein binding against arbitrary DNA sequences. Our method was the best performer in the Dialogue for Reverse Engineering Assessments and Methods 5 (DREAM5) transcription factor/DNA motif recognition challenge. For the DREAM5 bonus challenge, we also developed an approach for the identification of transcription factors based on their PBM binding profiles. Our approach for TF identification achieved the best performance in the bonus challenge.

Show MeSH