Limits...
Extensive protein and DNA backbone sampling improves structure-based specificity prediction for C2H2 zinc fingers.

Yanover C, Bradley P - Nucleic Acids Res. (2011)

Bottom Line: Sequence-specific DNA recognition by gene regulatory proteins is critical for proper cellular functioning.Here, we present a novel molecular modeling protocol for protein-DNA interfaces that borrows conformational sampling techniques from de novo protein structure prediction to generate a diverse ensemble of structural models from small fragments of related and unrelated protein-DNA complexes.The extensive conformational sampling is coupled with sequence space exploration so that binding preferences for the target protein can be inferred from the resulting optimized DNA sequences.

View Article: PubMed Central - PubMed

Affiliation: Program in Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, WA 98109-1024, USA.

ABSTRACT
Sequence-specific DNA recognition by gene regulatory proteins is critical for proper cellular functioning. The ability to predict the DNA binding preferences of these regulatory proteins from their amino acid sequence would greatly aid in reconstruction of their regulatory interactions. Structural modeling provides one route to such predictions: by building accurate molecular models of regulatory proteins in complex with candidate binding sites, and estimating their relative binding affinities for these sites using a suitable potential function, it should be possible to construct DNA binding profiles. Here, we present a novel molecular modeling protocol for protein-DNA interfaces that borrows conformational sampling techniques from de novo protein structure prediction to generate a diverse ensemble of structural models from small fragments of related and unrelated protein-DNA complexes. The extensive conformational sampling is coupled with sequence space exploration so that binding preferences for the target protein can be inferred from the resulting optimized DNA sequences. We apply the algorithm to predict binding profiles for a benchmark set of eleven C2H2 zinc finger transcription factors, five of known and six of unknown structure. The predicted profiles are in good agreement with experimental binding data; furthermore, examination of the modeled structures gives insight into observed binding preferences.

Show MeSH
Binding specificity predictions. To generate a PFM for a poly-ZF protein, we perform binding simulations on individual ZF domains and combine the results into a single specificity profile. Simulation results for the 3-finger ZF protein Zif268 are shown. At the top, a subset of the final protein–DNA interface models are superimposed. Green ribbons are used to depict the protein, with key specificity determining sidechains shown; the DNA is portrayed in stick representation with a phosphate backbone ribbon. Carbons in the core triplet binding site are colored red. Carbons in neighboring bases that contribute to the final combined PFM are colored yellow. DNA sequence preferences calculated from the final models in 1000 independent simulations are used to construct single-finger PFMs (middle), which are combined into a binding profile for the complete protein (bottom). PFMs are depicted as sequence logos using the program WebLogo (42); structure images were generated with the PyMOL molecular graphics program.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3113574&req=5

Figure 4: Binding specificity predictions. To generate a PFM for a poly-ZF protein, we perform binding simulations on individual ZF domains and combine the results into a single specificity profile. Simulation results for the 3-finger ZF protein Zif268 are shown. At the top, a subset of the final protein–DNA interface models are superimposed. Green ribbons are used to depict the protein, with key specificity determining sidechains shown; the DNA is portrayed in stick representation with a phosphate backbone ribbon. Carbons in the core triplet binding site are colored red. Carbons in neighboring bases that contribute to the final combined PFM are colored yellow. DNA sequence preferences calculated from the final models in 1000 independent simulations are used to construct single-finger PFMs (middle), which are combined into a binding profile for the complete protein (bottom). PFMs are depicted as sequence logos using the program WebLogo (42); structure images were generated with the PyMOL molecular graphics program.

Mentions: To generate a binding specificity profile for a C2H2 ZF, we first parsed the protein sequence into individual fingers using the Pfam (43) zf-C2H2 profile hidden Markov model. Binding simulations were conducted as described above for each of the fingers individually. In each binding simulation, we consider the DNA binding site to consist of a 5 base-pair region centered on the canonical triplet. The complete DNA molecule consists of the 5 base-pair binding site together with an additional G:C base pair on either side to provide structural context. The DNA sequence of the binding site is randomized at the start of each independent simulation and optimized during the all-atom MC simulation through the energetically biased acceptance of DNA mutation moves. Due to the limited number of mutation moves attempted during each simulation, the raw sequence preferences in the final DNA sequences are rather weak. As an estimate of the true binding preferences we boost the raw profile by taking all frequencies to the sixth power and renormalizing (this corresponds to a linear rescaling of energies, under the mapping between probabilities and energies given by the Boltzmann distribution). Note that this boosting procedure doesn't change the ordering of the bases; instead it is designed to provide an estimate of what the fully converged sequence preferences would be. The choice of exponent is somewhat arbitrary, and was based on inspection of frequency profiles from a limited number of very long MC simulations. To facilitate comparison with experimental binding data, we computed a position-specific frequency matrix (PFM) for the complete protein by combining the binding profiles for the individual fingers as indicated in Figure 4. When combining single-finger PFMs, internal fingers contribute only the three core triplet columns, while terminal fingers contribute additional context on either side (Figure 4). To construct a simple position-specific scoring matrix (PSSM) from this PFM, we take the logarithm of the frequencies after dividing by a uniform background of 0.25.Figure 4.


Extensive protein and DNA backbone sampling improves structure-based specificity prediction for C2H2 zinc fingers.

Yanover C, Bradley P - Nucleic Acids Res. (2011)

Binding specificity predictions. To generate a PFM for a poly-ZF protein, we perform binding simulations on individual ZF domains and combine the results into a single specificity profile. Simulation results for the 3-finger ZF protein Zif268 are shown. At the top, a subset of the final protein–DNA interface models are superimposed. Green ribbons are used to depict the protein, with key specificity determining sidechains shown; the DNA is portrayed in stick representation with a phosphate backbone ribbon. Carbons in the core triplet binding site are colored red. Carbons in neighboring bases that contribute to the final combined PFM are colored yellow. DNA sequence preferences calculated from the final models in 1000 independent simulations are used to construct single-finger PFMs (middle), which are combined into a binding profile for the complete protein (bottom). PFMs are depicted as sequence logos using the program WebLogo (42); structure images were generated with the PyMOL molecular graphics program.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3113574&req=5

Figure 4: Binding specificity predictions. To generate a PFM for a poly-ZF protein, we perform binding simulations on individual ZF domains and combine the results into a single specificity profile. Simulation results for the 3-finger ZF protein Zif268 are shown. At the top, a subset of the final protein–DNA interface models are superimposed. Green ribbons are used to depict the protein, with key specificity determining sidechains shown; the DNA is portrayed in stick representation with a phosphate backbone ribbon. Carbons in the core triplet binding site are colored red. Carbons in neighboring bases that contribute to the final combined PFM are colored yellow. DNA sequence preferences calculated from the final models in 1000 independent simulations are used to construct single-finger PFMs (middle), which are combined into a binding profile for the complete protein (bottom). PFMs are depicted as sequence logos using the program WebLogo (42); structure images were generated with the PyMOL molecular graphics program.
Mentions: To generate a binding specificity profile for a C2H2 ZF, we first parsed the protein sequence into individual fingers using the Pfam (43) zf-C2H2 profile hidden Markov model. Binding simulations were conducted as described above for each of the fingers individually. In each binding simulation, we consider the DNA binding site to consist of a 5 base-pair region centered on the canonical triplet. The complete DNA molecule consists of the 5 base-pair binding site together with an additional G:C base pair on either side to provide structural context. The DNA sequence of the binding site is randomized at the start of each independent simulation and optimized during the all-atom MC simulation through the energetically biased acceptance of DNA mutation moves. Due to the limited number of mutation moves attempted during each simulation, the raw sequence preferences in the final DNA sequences are rather weak. As an estimate of the true binding preferences we boost the raw profile by taking all frequencies to the sixth power and renormalizing (this corresponds to a linear rescaling of energies, under the mapping between probabilities and energies given by the Boltzmann distribution). Note that this boosting procedure doesn't change the ordering of the bases; instead it is designed to provide an estimate of what the fully converged sequence preferences would be. The choice of exponent is somewhat arbitrary, and was based on inspection of frequency profiles from a limited number of very long MC simulations. To facilitate comparison with experimental binding data, we computed a position-specific frequency matrix (PFM) for the complete protein by combining the binding profiles for the individual fingers as indicated in Figure 4. When combining single-finger PFMs, internal fingers contribute only the three core triplet columns, while terminal fingers contribute additional context on either side (Figure 4). To construct a simple position-specific scoring matrix (PSSM) from this PFM, we take the logarithm of the frequencies after dividing by a uniform background of 0.25.Figure 4.

Bottom Line: Sequence-specific DNA recognition by gene regulatory proteins is critical for proper cellular functioning.Here, we present a novel molecular modeling protocol for protein-DNA interfaces that borrows conformational sampling techniques from de novo protein structure prediction to generate a diverse ensemble of structural models from small fragments of related and unrelated protein-DNA complexes.The extensive conformational sampling is coupled with sequence space exploration so that binding preferences for the target protein can be inferred from the resulting optimized DNA sequences.

View Article: PubMed Central - PubMed

Affiliation: Program in Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, WA 98109-1024, USA.

ABSTRACT
Sequence-specific DNA recognition by gene regulatory proteins is critical for proper cellular functioning. The ability to predict the DNA binding preferences of these regulatory proteins from their amino acid sequence would greatly aid in reconstruction of their regulatory interactions. Structural modeling provides one route to such predictions: by building accurate molecular models of regulatory proteins in complex with candidate binding sites, and estimating their relative binding affinities for these sites using a suitable potential function, it should be possible to construct DNA binding profiles. Here, we present a novel molecular modeling protocol for protein-DNA interfaces that borrows conformational sampling techniques from de novo protein structure prediction to generate a diverse ensemble of structural models from small fragments of related and unrelated protein-DNA complexes. The extensive conformational sampling is coupled with sequence space exploration so that binding preferences for the target protein can be inferred from the resulting optimized DNA sequences. We apply the algorithm to predict binding profiles for a benchmark set of eleven C2H2 zinc finger transcription factors, five of known and six of unknown structure. The predicted profiles are in good agreement with experimental binding data; furthermore, examination of the modeled structures gives insight into observed binding preferences.

Show MeSH