Limits...
CSAR data set release 2012: ligands, affinities, complexes, and docking decoys.

Dunbar JB, Smith RD, Damm-Ganamet KL, Ahmed A, Esposito EX, Delproposto J, Chinnaswamy K, Kang YN, Kubish G, Gestwicki JE, Stuckey JA, Carlson HA - J Chem Inf Model (2013)

Bottom Line: This allows the direct comparison of the biological affinities for those compounds, providing a measure of the variance in the experimental affinity.It appears that there can be considerable variance in the absolute value of the affinity, making the prediction of the absolute value ill-defined.For those in-house compounds, we also have measured the following physical properties: logD, logP, thermodynamic solubility, and pK(a).

View Article: PubMed Central - PubMed

Affiliation: Department of Medicinal Chemistry, University of Michigan, 428 Church St., Ann Arbor, Michigan 48109-1065, USA. jbdunbar@umich.edu

ABSTRACT
A major goal in drug design is the improvement of computational methods for docking and scoring. The Community Structure Activity Resource (CSAR) has collected several data sets from industry and added in-house data sets that may be used for this purpose ( www.csardock.org). CSAR has currently obtained data from Abbott, GlaxoSmithKline, and Vertex and is working on obtaining data from several others. Combined with our in-house projects, we are providing a data set consisting of 6 protein targets, 647 compounds with biological affinities, and 82 crystal structures. Multiple congeneric series are available for several targets with a few representative crystal structures of each of the series. These series generally contain a few inactive compounds, usually not available in the literature, to provide an upper bound to the affinity range. The affinity ranges are typically 3-4 orders of magnitude per series. For our in-house projects, we have had compounds synthesized for biological testing. Affinities were measured by Thermofluor, Octet RED, and isothermal titration calorimetry for the most soluble. This allows the direct comparison of the biological affinities for those compounds, providing a measure of the variance in the experimental affinity. It appears that there can be considerable variance in the absolute value of the affinity, making the prediction of the absolute value ill-defined. However, the relative rankings within the methods are much better, and this fits with the observation that predicting relative ranking is a more tractable problem computationally. For those in-house compounds, we also have measured the following physical properties: logD, logP, thermodynamic solubility, and pK(a). This data set also provides a substantial decoy set for each target consisting of diverse conformations covering the entire active site for all of the 58 CSAR-quality crystal structures. The CSAR data sets (CSAR-NRC HiQ and the 2012 release) provide substantial, publically available, curated data sets for use in parametrizing and validating docking and scoring methods.

Show MeSH
Percentage frequenciesof RMSDs (Å) between decoy poses andthe native crystal pose for the different targets. The frequenciesare based on all the structures available for each target.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3753885&req=5

fig3: Percentage frequenciesof RMSDs (Å) between decoy poses andthe native crystal pose for the different targets. The frequenciesare based on all the structures available for each target.

Mentions: Figure 3 showsthe distribution of RMSDs between the decoy poses and the native posefor each of the six targets. A normal distribution with RMSD rangingfrom 1 to 22 Å and peaks of ∼10–15 Å can beseen for the different targets. As per the design, only one near-nativepose (RMSD < 1.0 Å) and no pose in the 1–2 Å rangeis evident in the distribution plots. This makes a clear distinctionbetween the near-native pose and other decoy poses in evaluating scoringfunctions. We acknowledge that the set may be biased by the forcefield used, and users should determine whether their methods willbe significantly affected. Minimization of the ligand with a rigidprotein may be necessary in some cases.


CSAR data set release 2012: ligands, affinities, complexes, and docking decoys.

Dunbar JB, Smith RD, Damm-Ganamet KL, Ahmed A, Esposito EX, Delproposto J, Chinnaswamy K, Kang YN, Kubish G, Gestwicki JE, Stuckey JA, Carlson HA - J Chem Inf Model (2013)

Percentage frequenciesof RMSDs (Å) between decoy poses andthe native crystal pose for the different targets. The frequenciesare based on all the structures available for each target.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3753885&req=5

fig3: Percentage frequenciesof RMSDs (Å) between decoy poses andthe native crystal pose for the different targets. The frequenciesare based on all the structures available for each target.
Mentions: Figure 3 showsthe distribution of RMSDs between the decoy poses and the native posefor each of the six targets. A normal distribution with RMSD rangingfrom 1 to 22 Å and peaks of ∼10–15 Å can beseen for the different targets. As per the design, only one near-nativepose (RMSD < 1.0 Å) and no pose in the 1–2 Å rangeis evident in the distribution plots. This makes a clear distinctionbetween the near-native pose and other decoy poses in evaluating scoringfunctions. We acknowledge that the set may be biased by the forcefield used, and users should determine whether their methods willbe significantly affected. Minimization of the ligand with a rigidprotein may be necessary in some cases.

Bottom Line: This allows the direct comparison of the biological affinities for those compounds, providing a measure of the variance in the experimental affinity.It appears that there can be considerable variance in the absolute value of the affinity, making the prediction of the absolute value ill-defined.For those in-house compounds, we also have measured the following physical properties: logD, logP, thermodynamic solubility, and pK(a).

View Article: PubMed Central - PubMed

Affiliation: Department of Medicinal Chemistry, University of Michigan, 428 Church St., Ann Arbor, Michigan 48109-1065, USA. jbdunbar@umich.edu

ABSTRACT
A major goal in drug design is the improvement of computational methods for docking and scoring. The Community Structure Activity Resource (CSAR) has collected several data sets from industry and added in-house data sets that may be used for this purpose ( www.csardock.org). CSAR has currently obtained data from Abbott, GlaxoSmithKline, and Vertex and is working on obtaining data from several others. Combined with our in-house projects, we are providing a data set consisting of 6 protein targets, 647 compounds with biological affinities, and 82 crystal structures. Multiple congeneric series are available for several targets with a few representative crystal structures of each of the series. These series generally contain a few inactive compounds, usually not available in the literature, to provide an upper bound to the affinity range. The affinity ranges are typically 3-4 orders of magnitude per series. For our in-house projects, we have had compounds synthesized for biological testing. Affinities were measured by Thermofluor, Octet RED, and isothermal titration calorimetry for the most soluble. This allows the direct comparison of the biological affinities for those compounds, providing a measure of the variance in the experimental affinity. It appears that there can be considerable variance in the absolute value of the affinity, making the prediction of the absolute value ill-defined. However, the relative rankings within the methods are much better, and this fits with the observation that predicting relative ranking is a more tractable problem computationally. For those in-house compounds, we also have measured the following physical properties: logD, logP, thermodynamic solubility, and pK(a). This data set also provides a substantial decoy set for each target consisting of diverse conformations covering the entire active site for all of the 58 CSAR-quality crystal structures. The CSAR data sets (CSAR-NRC HiQ and the 2012 release) provide substantial, publically available, curated data sets for use in parametrizing and validating docking and scoring methods.

Show MeSH