Limits...
Deriving high-resolution protein backbone structure propensities from all crystal data using the information maximization device.

Solis AD - PLoS ONE (2014)

Bottom Line: This work demonstrates the effectiveness of the IMD and the superiority of the resulting PDFs by extensive fold recognition experiments and rigorous comparisons with previously published triplet PDFs.Because it automatically optimizes PDFs, IMD results in improved performance of knowledge-based potentials, which rely on such PDFs.Furthermore, it provides an easy computational recipe for empirically deriving other kinds of sequence-dependent structural PDFs with greater detail and precision.

View Article: PubMed Central - PubMed

Affiliation: Biological Sciences Department, New York City College of Technology, The City University of New York, Brooklyn, New York, United States of America.

ABSTRACT
The most informative probability distribution functions (PDFs) describing the Ramachandran phi-psi dihedral angle pair, a fundamental descriptor of backbone conformation of protein molecules, are derived from high-resolution X-ray crystal structures using an information-theoretic approach. The Information Maximization Device (IMD) is established, based on fundamental information-theoretic concepts, and then applied specifically to derive highly resolved phi-psi maps for all 20 single amino acid and all 8000 triplet sequences at an optimal resolution determined by the volume of current data. The paper shows that utilizing the latent information contained in all viable high-resolution crystal structures found in the Protein Data Bank (PDB), totaling more than 77,000 chains, permits the derivation of a large number of optimized sequence-dependent PDFs. This work demonstrates the effectiveness of the IMD and the superiority of the resulting PDFs by extensive fold recognition experiments and rigorous comparisons with previously published triplet PDFs. Because it automatically optimizes PDFs, IMD results in improved performance of knowledge-based potentials, which rely on such PDFs. Furthermore, it provides an easy computational recipe for empirically deriving other kinds of sequence-dependent structural PDFs with greater detail and precision. The high-resolution phi-psi maps derived in this work are available for download.

Show MeSH
Comparison of mutual information scores for whole protein chains brought about by the flanking residues as assigned by triplet BETAN and BLCLUST KBPs.Eq.(12) is used to compute In(c/X_Z), the portion of the triplet score that can be attributed to the influence of flanking residues on the phi-psi conformation of the central residue. One score is computed from optimal triplet PDFs derived from BLCLUST (using weighted dynamic radius, at resolution 15.0°), and another score is computed from BETAN PDFs. These two scores are plotted here. More than 92% of the protein chains appear above the diagonal line, which means that BLCLUST PDFs are able to capture helpful information from flanking residues better than BETAN, so that the generally positive influence of flanking residues is better incorporated into the PDFs derived from BLCLUST compared to BETAN.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4045576&req=5

pone-0094334-g011: Comparison of mutual information scores for whole protein chains brought about by the flanking residues as assigned by triplet BETAN and BLCLUST KBPs.Eq.(12) is used to compute In(c/X_Z), the portion of the triplet score that can be attributed to the influence of flanking residues on the phi-psi conformation of the central residue. One score is computed from optimal triplet PDFs derived from BLCLUST (using weighted dynamic radius, at resolution 15.0°), and another score is computed from BETAN PDFs. These two scores are plotted here. More than 92% of the protein chains appear above the diagonal line, which means that BLCLUST PDFs are able to capture helpful information from flanking residues better than BETAN, so that the generally positive influence of flanking residues is better incorporated into the PDFs derived from BLCLUST compared to BETAN.

Mentions: The advancement brought by this work is to articulate the nuanced influence of the flanking residues on the backbone conformation of the central residue, given the limited amount of structural data available. Exclusively measuring In(c/X_Z), the average effect of flanking residues across the protein chain, gives some indication of the success of the methodology. For each of the 740 protein chains in BLC-NEW, the value for In(c/X_Z) was measured using both BLCLUST and BETAN PDFs. The result, plotted in Figure 11, reveals that in BLCLUST PDFs the effect of the flanking residues are better elucidated than in BETAN—i.e., 92.3% of chains bear an improved In(c/X_Z) with BLCLUST. Also, BLCLUST assigns negative In(c/X_Z) to only 6.8% of the chains, compared to BETAN which assigns negative values to 33.8% of the chains. A negative value for In(c/X_Z) suggests that, on average, the flanking residues do not assist in determining backbone conformation of a protein chain, an observation that is contrary to what is commonly assumed about local interactions in proteins. The much higher proportion of chains assigned negative In(c/X_Z) by BETAN indicates that its triplet PDFs are not well-elucidated compared to BLCLUST triplet PDFs. Conclusively, the effect of the flanking residues on the conformation of the central residue backbone is more accurately defined by BLCLUST PDFs.


Deriving high-resolution protein backbone structure propensities from all crystal data using the information maximization device.

Solis AD - PLoS ONE (2014)

Comparison of mutual information scores for whole protein chains brought about by the flanking residues as assigned by triplet BETAN and BLCLUST KBPs.Eq.(12) is used to compute In(c/X_Z), the portion of the triplet score that can be attributed to the influence of flanking residues on the phi-psi conformation of the central residue. One score is computed from optimal triplet PDFs derived from BLCLUST (using weighted dynamic radius, at resolution 15.0°), and another score is computed from BETAN PDFs. These two scores are plotted here. More than 92% of the protein chains appear above the diagonal line, which means that BLCLUST PDFs are able to capture helpful information from flanking residues better than BETAN, so that the generally positive influence of flanking residues is better incorporated into the PDFs derived from BLCLUST compared to BETAN.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4045576&req=5

pone-0094334-g011: Comparison of mutual information scores for whole protein chains brought about by the flanking residues as assigned by triplet BETAN and BLCLUST KBPs.Eq.(12) is used to compute In(c/X_Z), the portion of the triplet score that can be attributed to the influence of flanking residues on the phi-psi conformation of the central residue. One score is computed from optimal triplet PDFs derived from BLCLUST (using weighted dynamic radius, at resolution 15.0°), and another score is computed from BETAN PDFs. These two scores are plotted here. More than 92% of the protein chains appear above the diagonal line, which means that BLCLUST PDFs are able to capture helpful information from flanking residues better than BETAN, so that the generally positive influence of flanking residues is better incorporated into the PDFs derived from BLCLUST compared to BETAN.
Mentions: The advancement brought by this work is to articulate the nuanced influence of the flanking residues on the backbone conformation of the central residue, given the limited amount of structural data available. Exclusively measuring In(c/X_Z), the average effect of flanking residues across the protein chain, gives some indication of the success of the methodology. For each of the 740 protein chains in BLC-NEW, the value for In(c/X_Z) was measured using both BLCLUST and BETAN PDFs. The result, plotted in Figure 11, reveals that in BLCLUST PDFs the effect of the flanking residues are better elucidated than in BETAN—i.e., 92.3% of chains bear an improved In(c/X_Z) with BLCLUST. Also, BLCLUST assigns negative In(c/X_Z) to only 6.8% of the chains, compared to BETAN which assigns negative values to 33.8% of the chains. A negative value for In(c/X_Z) suggests that, on average, the flanking residues do not assist in determining backbone conformation of a protein chain, an observation that is contrary to what is commonly assumed about local interactions in proteins. The much higher proportion of chains assigned negative In(c/X_Z) by BETAN indicates that its triplet PDFs are not well-elucidated compared to BLCLUST triplet PDFs. Conclusively, the effect of the flanking residues on the conformation of the central residue backbone is more accurately defined by BLCLUST PDFs.

Bottom Line: This work demonstrates the effectiveness of the IMD and the superiority of the resulting PDFs by extensive fold recognition experiments and rigorous comparisons with previously published triplet PDFs.Because it automatically optimizes PDFs, IMD results in improved performance of knowledge-based potentials, which rely on such PDFs.Furthermore, it provides an easy computational recipe for empirically deriving other kinds of sequence-dependent structural PDFs with greater detail and precision.

View Article: PubMed Central - PubMed

Affiliation: Biological Sciences Department, New York City College of Technology, The City University of New York, Brooklyn, New York, United States of America.

ABSTRACT
The most informative probability distribution functions (PDFs) describing the Ramachandran phi-psi dihedral angle pair, a fundamental descriptor of backbone conformation of protein molecules, are derived from high-resolution X-ray crystal structures using an information-theoretic approach. The Information Maximization Device (IMD) is established, based on fundamental information-theoretic concepts, and then applied specifically to derive highly resolved phi-psi maps for all 20 single amino acid and all 8000 triplet sequences at an optimal resolution determined by the volume of current data. The paper shows that utilizing the latent information contained in all viable high-resolution crystal structures found in the Protein Data Bank (PDB), totaling more than 77,000 chains, permits the derivation of a large number of optimized sequence-dependent PDFs. This work demonstrates the effectiveness of the IMD and the superiority of the resulting PDFs by extensive fold recognition experiments and rigorous comparisons with previously published triplet PDFs. Because it automatically optimizes PDFs, IMD results in improved performance of knowledge-based potentials, which rely on such PDFs. Furthermore, it provides an easy computational recipe for empirically deriving other kinds of sequence-dependent structural PDFs with greater detail and precision. The high-resolution phi-psi maps derived in this work are available for download.

Show MeSH