Limits...
Deriving high-resolution protein backbone structure propensities from all crystal data using the information maximization device.

Solis AD - PLoS ONE (2014)

Bottom Line: This work demonstrates the effectiveness of the IMD and the superiority of the resulting PDFs by extensive fold recognition experiments and rigorous comparisons with previously published triplet PDFs.Because it automatically optimizes PDFs, IMD results in improved performance of knowledge-based potentials, which rely on such PDFs.Furthermore, it provides an easy computational recipe for empirically deriving other kinds of sequence-dependent structural PDFs with greater detail and precision.

View Article: PubMed Central - PubMed

Affiliation: Biological Sciences Department, New York City College of Technology, The City University of New York, Brooklyn, New York, United States of America.

ABSTRACT
The most informative probability distribution functions (PDFs) describing the Ramachandran phi-psi dihedral angle pair, a fundamental descriptor of backbone conformation of protein molecules, are derived from high-resolution X-ray crystal structures using an information-theoretic approach. The Information Maximization Device (IMD) is established, based on fundamental information-theoretic concepts, and then applied specifically to derive highly resolved phi-psi maps for all 20 single amino acid and all 8000 triplet sequences at an optimal resolution determined by the volume of current data. The paper shows that utilizing the latent information contained in all viable high-resolution crystal structures found in the Protein Data Bank (PDB), totaling more than 77,000 chains, permits the derivation of a large number of optimized sequence-dependent PDFs. This work demonstrates the effectiveness of the IMD and the superiority of the resulting PDFs by extensive fold recognition experiments and rigorous comparisons with previously published triplet PDFs. Because it automatically optimizes PDFs, IMD results in improved performance of knowledge-based potentials, which rely on such PDFs. Furthermore, it provides an easy computational recipe for empirically deriving other kinds of sequence-dependent structural PDFs with greater detail and precision. The high-resolution phi-psi maps derived in this work are available for download.

Show MeSH

Related in: MedlinePlus

Dependence of mutual information score In(c/s) on crystallographic resolution.For each of the 740 protein chains in the BLC-NEW data set, the score In(c/s), derived from BLCLUST KBPs, is computed using Eq.(9) and plotted against the crystallographic resolution (in Ångstroms) of its experimental structure. A generalized correlation can be observed in this initial study. High-resolution structures are expected to contain phi-psi angles in the normal regions of the Ramachandran space, which are highly populated and should produce high mutual information scores. Conversely, lower resolution structures may contain a number of unnatural phi-psi angles that are penalized by the In(c/s) function. This initial exploration points to the possibility of using triplet PDFs in structure validation and model refinement.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4045576&req=5

pone-0094334-g015: Dependence of mutual information score In(c/s) on crystallographic resolution.For each of the 740 protein chains in the BLC-NEW data set, the score In(c/s), derived from BLCLUST KBPs, is computed using Eq.(9) and plotted against the crystallographic resolution (in Ångstroms) of its experimental structure. A generalized correlation can be observed in this initial study. High-resolution structures are expected to contain phi-psi angles in the normal regions of the Ramachandran space, which are highly populated and should produce high mutual information scores. Conversely, lower resolution structures may contain a number of unnatural phi-psi angles that are penalized by the In(c/s) function. This initial exploration points to the possibility of using triplet PDFs in structure validation and model refinement.

Mentions: To begin to explore the viability of BLCLUST PDFs in structure validation and similar applications, BLCLUST KBPs were used to score the native conformations of 740 chains in BLC-NEW, a diverse collection of newly solved protein structures that are not homologous to any proteins in BLCLUST. In Figure 15, the resolution for each chain was plotted against its In(c/s) score (Eq.9), which can be taken as a measure of the “normalness” of the phi-psi angle pairs of the experimental structure. The higher the In(c/s) score, the more the phi-psi angle pairs conform, on average, to expected and highly populated values. A general correlation can be observed in Figure 15: specifically, low resolution crystal structures tend to have relatively lower In(c/s) scores compared to higher resolution structures. This is because structures of low resolution will likely contain phi-psi angle pairs outside natural regions of the Ramachandran space, which the In(c/s) function is able to detect and penalize. This initial observation supports the hypothesis that these phi-psi maps are potentially useful in structure validation and model refinement. Confirmation of this hypothesis by more extensive measurements, and formalizing the use of In(c/s) as a structure validation parameter, are among the future directions arising from this work.


Deriving high-resolution protein backbone structure propensities from all crystal data using the information maximization device.

Solis AD - PLoS ONE (2014)

Dependence of mutual information score In(c/s) on crystallographic resolution.For each of the 740 protein chains in the BLC-NEW data set, the score In(c/s), derived from BLCLUST KBPs, is computed using Eq.(9) and plotted against the crystallographic resolution (in Ångstroms) of its experimental structure. A generalized correlation can be observed in this initial study. High-resolution structures are expected to contain phi-psi angles in the normal regions of the Ramachandran space, which are highly populated and should produce high mutual information scores. Conversely, lower resolution structures may contain a number of unnatural phi-psi angles that are penalized by the In(c/s) function. This initial exploration points to the possibility of using triplet PDFs in structure validation and model refinement.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4045576&req=5

pone-0094334-g015: Dependence of mutual information score In(c/s) on crystallographic resolution.For each of the 740 protein chains in the BLC-NEW data set, the score In(c/s), derived from BLCLUST KBPs, is computed using Eq.(9) and plotted against the crystallographic resolution (in Ångstroms) of its experimental structure. A generalized correlation can be observed in this initial study. High-resolution structures are expected to contain phi-psi angles in the normal regions of the Ramachandran space, which are highly populated and should produce high mutual information scores. Conversely, lower resolution structures may contain a number of unnatural phi-psi angles that are penalized by the In(c/s) function. This initial exploration points to the possibility of using triplet PDFs in structure validation and model refinement.
Mentions: To begin to explore the viability of BLCLUST PDFs in structure validation and similar applications, BLCLUST KBPs were used to score the native conformations of 740 chains in BLC-NEW, a diverse collection of newly solved protein structures that are not homologous to any proteins in BLCLUST. In Figure 15, the resolution for each chain was plotted against its In(c/s) score (Eq.9), which can be taken as a measure of the “normalness” of the phi-psi angle pairs of the experimental structure. The higher the In(c/s) score, the more the phi-psi angle pairs conform, on average, to expected and highly populated values. A general correlation can be observed in Figure 15: specifically, low resolution crystal structures tend to have relatively lower In(c/s) scores compared to higher resolution structures. This is because structures of low resolution will likely contain phi-psi angle pairs outside natural regions of the Ramachandran space, which the In(c/s) function is able to detect and penalize. This initial observation supports the hypothesis that these phi-psi maps are potentially useful in structure validation and model refinement. Confirmation of this hypothesis by more extensive measurements, and formalizing the use of In(c/s) as a structure validation parameter, are among the future directions arising from this work.

Bottom Line: This work demonstrates the effectiveness of the IMD and the superiority of the resulting PDFs by extensive fold recognition experiments and rigorous comparisons with previously published triplet PDFs.Because it automatically optimizes PDFs, IMD results in improved performance of knowledge-based potentials, which rely on such PDFs.Furthermore, it provides an easy computational recipe for empirically deriving other kinds of sequence-dependent structural PDFs with greater detail and precision.

View Article: PubMed Central - PubMed

Affiliation: Biological Sciences Department, New York City College of Technology, The City University of New York, Brooklyn, New York, United States of America.

ABSTRACT
The most informative probability distribution functions (PDFs) describing the Ramachandran phi-psi dihedral angle pair, a fundamental descriptor of backbone conformation of protein molecules, are derived from high-resolution X-ray crystal structures using an information-theoretic approach. The Information Maximization Device (IMD) is established, based on fundamental information-theoretic concepts, and then applied specifically to derive highly resolved phi-psi maps for all 20 single amino acid and all 8000 triplet sequences at an optimal resolution determined by the volume of current data. The paper shows that utilizing the latent information contained in all viable high-resolution crystal structures found in the Protein Data Bank (PDB), totaling more than 77,000 chains, permits the derivation of a large number of optimized sequence-dependent PDFs. This work demonstrates the effectiveness of the IMD and the superiority of the resulting PDFs by extensive fold recognition experiments and rigorous comparisons with previously published triplet PDFs. Because it automatically optimizes PDFs, IMD results in improved performance of knowledge-based potentials, which rely on such PDFs. Furthermore, it provides an easy computational recipe for empirically deriving other kinds of sequence-dependent structural PDFs with greater detail and precision. The high-resolution phi-psi maps derived in this work are available for download.

Show MeSH
Related in: MedlinePlus