Limits...
Novel computational analysis of protein binding array data identifies direct targets of Nkx2.2 in the pancreas.

Hill JT, Anderson KR, Mastracci TL, Kaestner KH, Sussel L - BMC Bioinformatics (2011)

Bottom Line: Generation of new prediction methods that are based on protein binding data, but do not rely on these models may improve prediction sensitivity and specificity.We have also identified an alternative core sequence recognized by the Nkx2.2 homeodomain.Finally, we show evidence that the algorithm can also be applied to predict binding sites for the nuclear receptor Hnf4α.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genetics and Development, Columbia University, New York, NY 10032, USA.

ABSTRACT

Background: The creation of a complete genome-wide map of transcription factor binding sites is essential for understanding gene regulatory networks in vivo. However, current prediction methods generally rely on statistical models that imperfectly model transcription factor binding. Generation of new prediction methods that are based on protein binding data, but do not rely on these models may improve prediction sensitivity and specificity.

Results: We propose a method for predicting transcription factor binding sites in the genome by directly mapping data generated from protein binding microarrays (PBM) to the genome and calculating a moving average of several overlapping octamers. Using this unique algorithm, we predicted binding sites for the essential pancreatic islet transcription factor Nkx2.2 in the mouse genome and confirmed >90% of the tested sites by EMSA and ChIP. Scores generated from this method more accurately predicted relative binding affinity than PWM based methods. We have also identified an alternative core sequence recognized by the Nkx2.2 homeodomain. Furthermore, we have shown that this method correctly identified binding sites in the promoters of two critical pancreatic islet β-cell genes, NeuroD1 and insulin2, that were not predicted by traditional methods. Finally, we show evidence that the algorithm can also be applied to predict binding sites for the nuclear receptor Hnf4α.

Conclusions: PBM-mapping is an accurate method for predicting Nkx2.2 binding sites and may be widely applicable for the creation of genome-wide maps of transcription factor binding sites.

Show MeSH
Nkx2.2 binds to the alternative core sequence "GAGT". (A) Table showing E-score distribution of octamers. E-scores were generated using protein binding microarray data. Octamers were divided into AAGT containing, GAGT containing and all octamers (left column). The number of octamers in each group with an E-score above 0.45 is shown in the middle column. Average E-score from all octamers in each group is shown in the right column. (B) Histogram plot of E-score distribution for AAGT, GAGT, TAGT and CAGT. Each point represents the percentage of total sites within a 0.10 bin that contain the given core sequence. (C) EMSA analysis of the canonical AAGT containing consensus probe (Sup. Table 3: "Nkx2.2 AAGT"), a GAGT core containing probe (Sup. Table 3: "Nkx2.2 GAGT"), and a probe with no core sequence (Sup. Table 3: "Nkx2.2 No Core"). Each probe was incubated with in vitro synthesized Nkx2.2 (Myc tagged-Nkx2.2 TNT Protein) or αTC1 nuclear extract with or without transfected Myc tagged-Nkx2.2.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3050729&req=5

Figure 1: Nkx2.2 binds to the alternative core sequence "GAGT". (A) Table showing E-score distribution of octamers. E-scores were generated using protein binding microarray data. Octamers were divided into AAGT containing, GAGT containing and all octamers (left column). The number of octamers in each group with an E-score above 0.45 is shown in the middle column. Average E-score from all octamers in each group is shown in the right column. (B) Histogram plot of E-score distribution for AAGT, GAGT, TAGT and CAGT. Each point represents the percentage of total sites within a 0.10 bin that contain the given core sequence. (C) EMSA analysis of the canonical AAGT containing consensus probe (Sup. Table 3: "Nkx2.2 AAGT"), a GAGT core containing probe (Sup. Table 3: "Nkx2.2 GAGT"), and a probe with no core sequence (Sup. Table 3: "Nkx2.2 No Core"). Each probe was incubated with in vitro synthesized Nkx2.2 (Myc tagged-Nkx2.2 TNT Protein) or αTC1 nuclear extract with or without transfected Myc tagged-Nkx2.2.

Mentions: We first selected and analyzed all Nkx2.2-bound octamers with an E-score greater than 0.45 (132 octamers, Figure 1A). Of these, 96 (73%) contained the previously published "AAGT" core sequence or its reverse complement. Of the remaining 36 octamers, 33 (25% of the total) had an alternative sequence "GAGT." Three octamers did not contain either core sequence. We next calculated the average E-score for octamers containing AAGT and octamers containing GAGT. The average of all possible octamers was used as a baseline control. AAGT and GAGT containing octamers had a mean E-score value of 0.197 and 0.160, respectively, which are significantly greater (P << 0.001) than the mean for all possible octamers (-0.029).


Novel computational analysis of protein binding array data identifies direct targets of Nkx2.2 in the pancreas.

Hill JT, Anderson KR, Mastracci TL, Kaestner KH, Sussel L - BMC Bioinformatics (2011)

Nkx2.2 binds to the alternative core sequence "GAGT". (A) Table showing E-score distribution of octamers. E-scores were generated using protein binding microarray data. Octamers were divided into AAGT containing, GAGT containing and all octamers (left column). The number of octamers in each group with an E-score above 0.45 is shown in the middle column. Average E-score from all octamers in each group is shown in the right column. (B) Histogram plot of E-score distribution for AAGT, GAGT, TAGT and CAGT. Each point represents the percentage of total sites within a 0.10 bin that contain the given core sequence. (C) EMSA analysis of the canonical AAGT containing consensus probe (Sup. Table 3: "Nkx2.2 AAGT"), a GAGT core containing probe (Sup. Table 3: "Nkx2.2 GAGT"), and a probe with no core sequence (Sup. Table 3: "Nkx2.2 No Core"). Each probe was incubated with in vitro synthesized Nkx2.2 (Myc tagged-Nkx2.2 TNT Protein) or αTC1 nuclear extract with or without transfected Myc tagged-Nkx2.2.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3050729&req=5

Figure 1: Nkx2.2 binds to the alternative core sequence "GAGT". (A) Table showing E-score distribution of octamers. E-scores were generated using protein binding microarray data. Octamers were divided into AAGT containing, GAGT containing and all octamers (left column). The number of octamers in each group with an E-score above 0.45 is shown in the middle column. Average E-score from all octamers in each group is shown in the right column. (B) Histogram plot of E-score distribution for AAGT, GAGT, TAGT and CAGT. Each point represents the percentage of total sites within a 0.10 bin that contain the given core sequence. (C) EMSA analysis of the canonical AAGT containing consensus probe (Sup. Table 3: "Nkx2.2 AAGT"), a GAGT core containing probe (Sup. Table 3: "Nkx2.2 GAGT"), and a probe with no core sequence (Sup. Table 3: "Nkx2.2 No Core"). Each probe was incubated with in vitro synthesized Nkx2.2 (Myc tagged-Nkx2.2 TNT Protein) or αTC1 nuclear extract with or without transfected Myc tagged-Nkx2.2.
Mentions: We first selected and analyzed all Nkx2.2-bound octamers with an E-score greater than 0.45 (132 octamers, Figure 1A). Of these, 96 (73%) contained the previously published "AAGT" core sequence or its reverse complement. Of the remaining 36 octamers, 33 (25% of the total) had an alternative sequence "GAGT." Three octamers did not contain either core sequence. We next calculated the average E-score for octamers containing AAGT and octamers containing GAGT. The average of all possible octamers was used as a baseline control. AAGT and GAGT containing octamers had a mean E-score value of 0.197 and 0.160, respectively, which are significantly greater (P << 0.001) than the mean for all possible octamers (-0.029).

Bottom Line: Generation of new prediction methods that are based on protein binding data, but do not rely on these models may improve prediction sensitivity and specificity.We have also identified an alternative core sequence recognized by the Nkx2.2 homeodomain.Finally, we show evidence that the algorithm can also be applied to predict binding sites for the nuclear receptor Hnf4α.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genetics and Development, Columbia University, New York, NY 10032, USA.

ABSTRACT

Background: The creation of a complete genome-wide map of transcription factor binding sites is essential for understanding gene regulatory networks in vivo. However, current prediction methods generally rely on statistical models that imperfectly model transcription factor binding. Generation of new prediction methods that are based on protein binding data, but do not rely on these models may improve prediction sensitivity and specificity.

Results: We propose a method for predicting transcription factor binding sites in the genome by directly mapping data generated from protein binding microarrays (PBM) to the genome and calculating a moving average of several overlapping octamers. Using this unique algorithm, we predicted binding sites for the essential pancreatic islet transcription factor Nkx2.2 in the mouse genome and confirmed >90% of the tested sites by EMSA and ChIP. Scores generated from this method more accurately predicted relative binding affinity than PWM based methods. We have also identified an alternative core sequence recognized by the Nkx2.2 homeodomain. Furthermore, we have shown that this method correctly identified binding sites in the promoters of two critical pancreatic islet β-cell genes, NeuroD1 and insulin2, that were not predicted by traditional methods. Finally, we show evidence that the algorithm can also be applied to predict binding sites for the nuclear receptor Hnf4α.

Conclusions: PBM-mapping is an accurate method for predicting Nkx2.2 binding sites and may be widely applicable for the creation of genome-wide maps of transcription factor binding sites.

Show MeSH