Limits...
Novel computational analysis of protein binding array data identifies direct targets of Nkx2.2 in the pancreas.

Hill JT, Anderson KR, Mastracci TL, Kaestner KH, Sussel L - BMC Bioinformatics (2011)

Bottom Line: Generation of new prediction methods that are based on protein binding data, but do not rely on these models may improve prediction sensitivity and specificity.We have also identified an alternative core sequence recognized by the Nkx2.2 homeodomain.Finally, we show evidence that the algorithm can also be applied to predict binding sites for the nuclear receptor Hnf4α.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genetics and Development, Columbia University, New York, NY 10032, USA.

ABSTRACT

Background: The creation of a complete genome-wide map of transcription factor binding sites is essential for understanding gene regulatory networks in vivo. However, current prediction methods generally rely on statistical models that imperfectly model transcription factor binding. Generation of new prediction methods that are based on protein binding data, but do not rely on these models may improve prediction sensitivity and specificity.

Results: We propose a method for predicting transcription factor binding sites in the genome by directly mapping data generated from protein binding microarrays (PBM) to the genome and calculating a moving average of several overlapping octamers. Using this unique algorithm, we predicted binding sites for the essential pancreatic islet transcription factor Nkx2.2 in the mouse genome and confirmed >90% of the tested sites by EMSA and ChIP. Scores generated from this method more accurately predicted relative binding affinity than PWM based methods. We have also identified an alternative core sequence recognized by the Nkx2.2 homeodomain. Furthermore, we have shown that this method correctly identified binding sites in the promoters of two critical pancreatic islet β-cell genes, NeuroD1 and insulin2, that were not predicted by traditional methods. Finally, we show evidence that the algorithm can also be applied to predict binding sites for the nuclear receptor Hnf4α.

Conclusions: PBM-mapping is an accurate method for predicting Nkx2.2 binding sites and may be widely applicable for the creation of genome-wide maps of transcription factor binding sites.

Show MeSH

Related in: MedlinePlus

Linear regression of various prediction methods and relative binding affinity. In each panel, the highest score obtained from the EMSA probe was compared to relative binding affinity (fraction bound) calculated from the EMSA in Figure 2. Probes with more than one predicted site (Spk3 -1044 and Nkx2.2 -1503) were excluded. Scores from probes that were not bound in the EMSA (Gcg -1080, Nkx6.2 +1669, and Ins2 -144) were plotted along the X-axis and not used for r-squared calculation. Scores used were (A) average e-score from 7 overlapping octamers from PBM-mapping, (B) log-odds from TRANSFAC-PWM, and (C) Seed and Wobble matrix score from PBM-PWM.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3050729&req=5

Figure 4: Linear regression of various prediction methods and relative binding affinity. In each panel, the highest score obtained from the EMSA probe was compared to relative binding affinity (fraction bound) calculated from the EMSA in Figure 2. Probes with more than one predicted site (Spk3 -1044 and Nkx2.2 -1503) were excluded. Scores from probes that were not bound in the EMSA (Gcg -1080, Nkx6.2 +1669, and Ins2 -144) were plotted along the X-axis and not used for r-squared calculation. Scores used were (A) average e-score from 7 overlapping octamers from PBM-mapping, (B) log-odds from TRANSFAC-PWM, and (C) Seed and Wobble matrix score from PBM-PWM.

Mentions: Transcription factor binding in vivo is not a binary event but a continuum of site occupancy proportional to the binding affinity (Ka) of the transcription factor and its binding site. Therefore, the ideal TFBS prediction algorithm would generate a score that is highly correlated with transcription factor binding affinity. It has been proposed that the E-score from PBM experiments is indicative of relative binding affinity and preliminary experiments have shown correlation between individual octamer E-scores and binding affinity [3,21]. Therefore, in order to test whether single octamer and average E-scores are correlated with relative Nkx2.2 binding affinity, we quantified the fraction bound for each site in the EMSA analysis (normalized to the probe with the largest bound fraction) and graphed it against single E-scores for the highest octamer and averages of 3, 5, 6, 7 or 8 oligos (Additional File 3). The fractional occupancy of a transcription factor bound to a DNA binding site is indicative of the relative binding affinities of the ligands [27]. The average of 7 overlapping scores showed the highest correlation with relative binding affinity (r-squared = 0.666) and outperformed both the TRANSFAC PWM score (r-squared = 0.305) and the PBM seed and wobble matrix score (r-squared = 0.604) (Figure 4). In order to confirm the correlation between the PBM-mapping score and biochemically-derived binding affinity values, we analyzed 22 binding-sites with Kd values that were determined for the Nkx2.2 drosophila homolog, vnd [28]. The homeodomains of the fly and mouse proteins contain 95% amino acid identity and greater than 98% similarity, therefore the Kd values for Nkx2.2 and vnd should also be very similar. Regression analysis of PBM-mapping scores against the Kd values for 22 vnd sites showed strong correlation (r2 = 0.83, Additional File 4). Taken together, these experiments show that PBM-mapping represents a highly accurate prediction method to find genome wide binding sites.


Novel computational analysis of protein binding array data identifies direct targets of Nkx2.2 in the pancreas.

Hill JT, Anderson KR, Mastracci TL, Kaestner KH, Sussel L - BMC Bioinformatics (2011)

Linear regression of various prediction methods and relative binding affinity. In each panel, the highest score obtained from the EMSA probe was compared to relative binding affinity (fraction bound) calculated from the EMSA in Figure 2. Probes with more than one predicted site (Spk3 -1044 and Nkx2.2 -1503) were excluded. Scores from probes that were not bound in the EMSA (Gcg -1080, Nkx6.2 +1669, and Ins2 -144) were plotted along the X-axis and not used for r-squared calculation. Scores used were (A) average e-score from 7 overlapping octamers from PBM-mapping, (B) log-odds from TRANSFAC-PWM, and (C) Seed and Wobble matrix score from PBM-PWM.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3050729&req=5

Figure 4: Linear regression of various prediction methods and relative binding affinity. In each panel, the highest score obtained from the EMSA probe was compared to relative binding affinity (fraction bound) calculated from the EMSA in Figure 2. Probes with more than one predicted site (Spk3 -1044 and Nkx2.2 -1503) were excluded. Scores from probes that were not bound in the EMSA (Gcg -1080, Nkx6.2 +1669, and Ins2 -144) were plotted along the X-axis and not used for r-squared calculation. Scores used were (A) average e-score from 7 overlapping octamers from PBM-mapping, (B) log-odds from TRANSFAC-PWM, and (C) Seed and Wobble matrix score from PBM-PWM.
Mentions: Transcription factor binding in vivo is not a binary event but a continuum of site occupancy proportional to the binding affinity (Ka) of the transcription factor and its binding site. Therefore, the ideal TFBS prediction algorithm would generate a score that is highly correlated with transcription factor binding affinity. It has been proposed that the E-score from PBM experiments is indicative of relative binding affinity and preliminary experiments have shown correlation between individual octamer E-scores and binding affinity [3,21]. Therefore, in order to test whether single octamer and average E-scores are correlated with relative Nkx2.2 binding affinity, we quantified the fraction bound for each site in the EMSA analysis (normalized to the probe with the largest bound fraction) and graphed it against single E-scores for the highest octamer and averages of 3, 5, 6, 7 or 8 oligos (Additional File 3). The fractional occupancy of a transcription factor bound to a DNA binding site is indicative of the relative binding affinities of the ligands [27]. The average of 7 overlapping scores showed the highest correlation with relative binding affinity (r-squared = 0.666) and outperformed both the TRANSFAC PWM score (r-squared = 0.305) and the PBM seed and wobble matrix score (r-squared = 0.604) (Figure 4). In order to confirm the correlation between the PBM-mapping score and biochemically-derived binding affinity values, we analyzed 22 binding-sites with Kd values that were determined for the Nkx2.2 drosophila homolog, vnd [28]. The homeodomains of the fly and mouse proteins contain 95% amino acid identity and greater than 98% similarity, therefore the Kd values for Nkx2.2 and vnd should also be very similar. Regression analysis of PBM-mapping scores against the Kd values for 22 vnd sites showed strong correlation (r2 = 0.83, Additional File 4). Taken together, these experiments show that PBM-mapping represents a highly accurate prediction method to find genome wide binding sites.

Bottom Line: Generation of new prediction methods that are based on protein binding data, but do not rely on these models may improve prediction sensitivity and specificity.We have also identified an alternative core sequence recognized by the Nkx2.2 homeodomain.Finally, we show evidence that the algorithm can also be applied to predict binding sites for the nuclear receptor Hnf4α.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genetics and Development, Columbia University, New York, NY 10032, USA.

ABSTRACT

Background: The creation of a complete genome-wide map of transcription factor binding sites is essential for understanding gene regulatory networks in vivo. However, current prediction methods generally rely on statistical models that imperfectly model transcription factor binding. Generation of new prediction methods that are based on protein binding data, but do not rely on these models may improve prediction sensitivity and specificity.

Results: We propose a method for predicting transcription factor binding sites in the genome by directly mapping data generated from protein binding microarrays (PBM) to the genome and calculating a moving average of several overlapping octamers. Using this unique algorithm, we predicted binding sites for the essential pancreatic islet transcription factor Nkx2.2 in the mouse genome and confirmed >90% of the tested sites by EMSA and ChIP. Scores generated from this method more accurately predicted relative binding affinity than PWM based methods. We have also identified an alternative core sequence recognized by the Nkx2.2 homeodomain. Furthermore, we have shown that this method correctly identified binding sites in the promoters of two critical pancreatic islet β-cell genes, NeuroD1 and insulin2, that were not predicted by traditional methods. Finally, we show evidence that the algorithm can also be applied to predict binding sites for the nuclear receptor Hnf4α.

Conclusions: PBM-mapping is an accurate method for predicting Nkx2.2 binding sites and may be widely applicable for the creation of genome-wide maps of transcription factor binding sites.

Show MeSH
Related in: MedlinePlus