Limits...
Extensive protein and DNA backbone sampling improves structure-based specificity prediction for C2H2 zinc fingers.

Yanover C, Bradley P - Nucleic Acids Res. (2011)

Bottom Line: Sequence-specific DNA recognition by gene regulatory proteins is critical for proper cellular functioning.Here, we present a novel molecular modeling protocol for protein-DNA interfaces that borrows conformational sampling techniques from de novo protein structure prediction to generate a diverse ensemble of structural models from small fragments of related and unrelated protein-DNA complexes.The extensive conformational sampling is coupled with sequence space exploration so that binding preferences for the target protein can be inferred from the resulting optimized DNA sequences.

View Article: PubMed Central - PubMed

Affiliation: Program in Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, WA 98109-1024, USA.

ABSTRACT
Sequence-specific DNA recognition by gene regulatory proteins is critical for proper cellular functioning. The ability to predict the DNA binding preferences of these regulatory proteins from their amino acid sequence would greatly aid in reconstruction of their regulatory interactions. Structural modeling provides one route to such predictions: by building accurate molecular models of regulatory proteins in complex with candidate binding sites, and estimating their relative binding affinities for these sites using a suitable potential function, it should be possible to construct DNA binding profiles. Here, we present a novel molecular modeling protocol for protein-DNA interfaces that borrows conformational sampling techniques from de novo protein structure prediction to generate a diverse ensemble of structural models from small fragments of related and unrelated protein-DNA complexes. The extensive conformational sampling is coupled with sequence space exploration so that binding preferences for the target protein can be inferred from the resulting optimized DNA sequences. We apply the algorithm to predict binding profiles for a benchmark set of eleven C2H2 zinc finger transcription factors, five of known and six of unknown structure. The predicted profiles are in good agreement with experimental binding data; furthermore, examination of the modeled structures gives insight into observed binding preferences.

Show MeSH
Strength of target-site match to predicted binding profile correlates with in vivo activity (fold-activation of a reporter gene in a bacterial 2-hybrid assay, see text) for 401 designed ZF proteins. For each protein, we converted the structure-based PFM into a PSSM and plotted the PSSM score of the selection target site against the experimentally assayed binding activity as reported in the ZiFDB (56). Yellow circles indicate proteins with large experimental error; red circles represent a subset of proteins for which the experimental activity is likely to be an underestimate (see text).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3113574&req=5

Figure 9: Strength of target-site match to predicted binding profile correlates with in vivo activity (fold-activation of a reporter gene in a bacterial 2-hybrid assay, see text) for 401 designed ZF proteins. For each protein, we converted the structure-based PFM into a PSSM and plotted the PSSM score of the selection target site against the experimentally assayed binding activity as reported in the ZiFDB (56). Yellow circles indicate proteins with large experimental error; red circles represent a subset of proteins for which the experimental activity is likely to be an underestimate (see text).

Mentions: The engineered zinc finger protein TATAZF (55) was the least successful target in our binding specificity prediction benchmark. To assess whether engineered zinc finger proteins are inherently more challenging than naturally occurring zinc fingers, we conducted binding specificity predictions on 401 3-finger ZF proteins generated by the OPEN (Oligomerized Pool ENgineering) platform (35) and available for download from the ZiFDB database (56). For each protein, we downloaded the amino acid sequence, the 9 base pair target site for which the protein was selected, and a quantitative in vivo measure of binding activity. Comparing the structure-based predictions to the target sites, we found that the predicted optimal base matched the target site base at 80% of the positions (2889 of the 3609 positions, full results are given in the Supplementary Table S1; note that the target site is not necessarily the optimal binding site for each finger). This level of accuracy agrees well with the results reported above on a smaller set consisting primarily of naturally occurring ZFs. We then asked whether we could predict the experimentally measured level of activity seen for each protein [fold activation over background of a reporter gene in a bacterial 2-hybrid (B2H) assay, ‘B2H fold-activation’]. We calculated a predicted binding score for each target site by computing its match to a PSSM derived from the final sequences in the fragment assembly simulations. Figure 9 shows a scatter plot of the predicted binding score (x-axis) versus the logarithm of the experimentally determined B2H fold-activation. Although there is considerable scatter, a correlation between the predicted and experimentally observed activities can be clearly seen (linear regression fit: R2 = 0.23, P-value = 6.3e − 24). Shown in yellow are a subset of ZFs with high experimental error (SD > 5), which show larger deviations from the best-fit line. In red are a subset of ZFs whose target sites have higher baseline levels of promoter activity, which may lower the apparent fold-activation [Supplementary Data for Maeder et al. (35)]; indeed, these ZFs all fall below the best-fit line. Although these correlations between structure-based predictions and an in vivo biological readout are encouraging, there is likely room for improvement: using a linear model whose inputs are the experimentally determined KD values for the component ZF modules, Sander et al. (57) were able to achieve significantly more accurate (R2 = 0.64) predictions of B2H fold-activation for a set of 53 modularly-assembled three-finger ZF proteins.Figure 9.


Extensive protein and DNA backbone sampling improves structure-based specificity prediction for C2H2 zinc fingers.

Yanover C, Bradley P - Nucleic Acids Res. (2011)

Strength of target-site match to predicted binding profile correlates with in vivo activity (fold-activation of a reporter gene in a bacterial 2-hybrid assay, see text) for 401 designed ZF proteins. For each protein, we converted the structure-based PFM into a PSSM and plotted the PSSM score of the selection target site against the experimentally assayed binding activity as reported in the ZiFDB (56). Yellow circles indicate proteins with large experimental error; red circles represent a subset of proteins for which the experimental activity is likely to be an underestimate (see text).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3113574&req=5

Figure 9: Strength of target-site match to predicted binding profile correlates with in vivo activity (fold-activation of a reporter gene in a bacterial 2-hybrid assay, see text) for 401 designed ZF proteins. For each protein, we converted the structure-based PFM into a PSSM and plotted the PSSM score of the selection target site against the experimentally assayed binding activity as reported in the ZiFDB (56). Yellow circles indicate proteins with large experimental error; red circles represent a subset of proteins for which the experimental activity is likely to be an underestimate (see text).
Mentions: The engineered zinc finger protein TATAZF (55) was the least successful target in our binding specificity prediction benchmark. To assess whether engineered zinc finger proteins are inherently more challenging than naturally occurring zinc fingers, we conducted binding specificity predictions on 401 3-finger ZF proteins generated by the OPEN (Oligomerized Pool ENgineering) platform (35) and available for download from the ZiFDB database (56). For each protein, we downloaded the amino acid sequence, the 9 base pair target site for which the protein was selected, and a quantitative in vivo measure of binding activity. Comparing the structure-based predictions to the target sites, we found that the predicted optimal base matched the target site base at 80% of the positions (2889 of the 3609 positions, full results are given in the Supplementary Table S1; note that the target site is not necessarily the optimal binding site for each finger). This level of accuracy agrees well with the results reported above on a smaller set consisting primarily of naturally occurring ZFs. We then asked whether we could predict the experimentally measured level of activity seen for each protein [fold activation over background of a reporter gene in a bacterial 2-hybrid (B2H) assay, ‘B2H fold-activation’]. We calculated a predicted binding score for each target site by computing its match to a PSSM derived from the final sequences in the fragment assembly simulations. Figure 9 shows a scatter plot of the predicted binding score (x-axis) versus the logarithm of the experimentally determined B2H fold-activation. Although there is considerable scatter, a correlation between the predicted and experimentally observed activities can be clearly seen (linear regression fit: R2 = 0.23, P-value = 6.3e − 24). Shown in yellow are a subset of ZFs with high experimental error (SD > 5), which show larger deviations from the best-fit line. In red are a subset of ZFs whose target sites have higher baseline levels of promoter activity, which may lower the apparent fold-activation [Supplementary Data for Maeder et al. (35)]; indeed, these ZFs all fall below the best-fit line. Although these correlations between structure-based predictions and an in vivo biological readout are encouraging, there is likely room for improvement: using a linear model whose inputs are the experimentally determined KD values for the component ZF modules, Sander et al. (57) were able to achieve significantly more accurate (R2 = 0.64) predictions of B2H fold-activation for a set of 53 modularly-assembled three-finger ZF proteins.Figure 9.

Bottom Line: Sequence-specific DNA recognition by gene regulatory proteins is critical for proper cellular functioning.Here, we present a novel molecular modeling protocol for protein-DNA interfaces that borrows conformational sampling techniques from de novo protein structure prediction to generate a diverse ensemble of structural models from small fragments of related and unrelated protein-DNA complexes.The extensive conformational sampling is coupled with sequence space exploration so that binding preferences for the target protein can be inferred from the resulting optimized DNA sequences.

View Article: PubMed Central - PubMed

Affiliation: Program in Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, WA 98109-1024, USA.

ABSTRACT
Sequence-specific DNA recognition by gene regulatory proteins is critical for proper cellular functioning. The ability to predict the DNA binding preferences of these regulatory proteins from their amino acid sequence would greatly aid in reconstruction of their regulatory interactions. Structural modeling provides one route to such predictions: by building accurate molecular models of regulatory proteins in complex with candidate binding sites, and estimating their relative binding affinities for these sites using a suitable potential function, it should be possible to construct DNA binding profiles. Here, we present a novel molecular modeling protocol for protein-DNA interfaces that borrows conformational sampling techniques from de novo protein structure prediction to generate a diverse ensemble of structural models from small fragments of related and unrelated protein-DNA complexes. The extensive conformational sampling is coupled with sequence space exploration so that binding preferences for the target protein can be inferred from the resulting optimized DNA sequences. We apply the algorithm to predict binding profiles for a benchmark set of eleven C2H2 zinc finger transcription factors, five of known and six of unknown structure. The predicted profiles are in good agreement with experimental binding data; furthermore, examination of the modeled structures gives insight into observed binding preferences.

Show MeSH