Limits...
A linear model for transcription factor binding affinity prediction in protein binding microarrays.

Annala M, Laurila K, Lähdesmäki H, Nykter M - PLoS ONE (2011)

Bottom Line: Our method was the best performer in the Dialogue for Reverse Engineering Assessments and Methods 5 (DREAM5) transcription factor/DNA motif recognition challenge.For the DREAM5 bonus challenge, we also developed an approach for the identification of transcription factors based on their PBM binding profiles.Our approach for TF identification achieved the best performance in the bonus challenge.

View Article: PubMed Central - PubMed

Affiliation: Department of Signal Processing, Tampere University of Technology, Tampere, Finland. matti.annala@tut.fi

ABSTRACT
Protein binding microarrays (PBM) are a high throughput technology used to characterize protein-DNA binding. The arrays measure a protein's affinity toward thousands of double-stranded DNA sequences at once, producing a comprehensive binding specificity catalog. We present a linear model for predicting the binding affinity of a protein toward DNA sequences based on PBM data. Our model represents the measured intensity of an individual probe as a sum of the binding affinity contributions of the probe's subsequences. These subsequences characterize a DNA binding motif and can be used to predict the intensity of protein binding against arbitrary DNA sequences. Our method was the best performer in the Dialogue for Reverse Engineering Assessments and Methods 5 (DREAM5) transcription factor/DNA motif recognition challenge. For the DREAM5 bonus challenge, we also developed an approach for the identification of transcription factors based on their PBM binding profiles. Our approach for TF identification achieved the best performance in the bonus challenge.

Show MeSH
Scatter plots of predicted intensities and saturated reference samples.The y-axis represents predicted probe intensities, while the x-axis represents true probe intensities on the reference array. The scatter plots clearly indicate the negative effect that reference sample saturation has on assessing the accuracy of model predictions.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3102690&req=5

pone-0020059-g008: Scatter plots of predicted intensities and saturated reference samples.The y-axis represents predicted probe intensities, while the x-axis represents true probe intensities on the reference array. The scatter plots clearly indicate the negative effect that reference sample saturation has on assessing the accuracy of model predictions.

Mentions: In some samples, a relatively large number of probes were found to be saturated at high intensities (Figure 7). In total, 22 HK array samples and 13 ME array samples contained such saturation artifacts. We used quantile normalization to deal with saturation in the samples used for training the motif model, but we originally did not perform any normalization on the reference samples against which our predictions were compared. In the DREAM5 challenge, this requirement was enforced by the organizers, who did not grant teams access to the reference samples during the challenge. However, as is clearly evident from the scatter plots of Figure 8, saturation in the reference samples did have a significant effect on reported correlations. Spatial artifacts were also highly abundant in the PBM samples (Figure S1).


A linear model for transcription factor binding affinity prediction in protein binding microarrays.

Annala M, Laurila K, Lähdesmäki H, Nykter M - PLoS ONE (2011)

Scatter plots of predicted intensities and saturated reference samples.The y-axis represents predicted probe intensities, while the x-axis represents true probe intensities on the reference array. The scatter plots clearly indicate the negative effect that reference sample saturation has on assessing the accuracy of model predictions.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3102690&req=5

pone-0020059-g008: Scatter plots of predicted intensities and saturated reference samples.The y-axis represents predicted probe intensities, while the x-axis represents true probe intensities on the reference array. The scatter plots clearly indicate the negative effect that reference sample saturation has on assessing the accuracy of model predictions.
Mentions: In some samples, a relatively large number of probes were found to be saturated at high intensities (Figure 7). In total, 22 HK array samples and 13 ME array samples contained such saturation artifacts. We used quantile normalization to deal with saturation in the samples used for training the motif model, but we originally did not perform any normalization on the reference samples against which our predictions were compared. In the DREAM5 challenge, this requirement was enforced by the organizers, who did not grant teams access to the reference samples during the challenge. However, as is clearly evident from the scatter plots of Figure 8, saturation in the reference samples did have a significant effect on reported correlations. Spatial artifacts were also highly abundant in the PBM samples (Figure S1).

Bottom Line: Our method was the best performer in the Dialogue for Reverse Engineering Assessments and Methods 5 (DREAM5) transcription factor/DNA motif recognition challenge.For the DREAM5 bonus challenge, we also developed an approach for the identification of transcription factors based on their PBM binding profiles.Our approach for TF identification achieved the best performance in the bonus challenge.

View Article: PubMed Central - PubMed

Affiliation: Department of Signal Processing, Tampere University of Technology, Tampere, Finland. matti.annala@tut.fi

ABSTRACT
Protein binding microarrays (PBM) are a high throughput technology used to characterize protein-DNA binding. The arrays measure a protein's affinity toward thousands of double-stranded DNA sequences at once, producing a comprehensive binding specificity catalog. We present a linear model for predicting the binding affinity of a protein toward DNA sequences based on PBM data. Our model represents the measured intensity of an individual probe as a sum of the binding affinity contributions of the probe's subsequences. These subsequences characterize a DNA binding motif and can be used to predict the intensity of protein binding against arbitrary DNA sequences. Our method was the best performer in the Dialogue for Reverse Engineering Assessments and Methods 5 (DREAM5) transcription factor/DNA motif recognition challenge. For the DREAM5 bonus challenge, we also developed an approach for the identification of transcription factors based on their PBM binding profiles. Our approach for TF identification achieved the best performance in the bonus challenge.

Show MeSH