Limits...
CSAR data set release 2012: ligands, affinities, complexes, and docking decoys.

Dunbar JB, Smith RD, Damm-Ganamet KL, Ahmed A, Esposito EX, Delproposto J, Chinnaswamy K, Kang YN, Kubish G, Gestwicki JE, Stuckey JA, Carlson HA - J Chem Inf Model (2013)

Bottom Line: This allows the direct comparison of the biological affinities for those compounds, providing a measure of the variance in the experimental affinity.It appears that there can be considerable variance in the absolute value of the affinity, making the prediction of the absolute value ill-defined.For those in-house compounds, we also have measured the following physical properties: logD, logP, thermodynamic solubility, and pK(a).

View Article: PubMed Central - PubMed

Affiliation: Department of Medicinal Chemistry, University of Michigan, 428 Church St., Ann Arbor, Michigan 48109-1065, USA. jbdunbar@umich.edu

ABSTRACT
A major goal in drug design is the improvement of computational methods for docking and scoring. The Community Structure Activity Resource (CSAR) has collected several data sets from industry and added in-house data sets that may be used for this purpose ( www.csardock.org). CSAR has currently obtained data from Abbott, GlaxoSmithKline, and Vertex and is working on obtaining data from several others. Combined with our in-house projects, we are providing a data set consisting of 6 protein targets, 647 compounds with biological affinities, and 82 crystal structures. Multiple congeneric series are available for several targets with a few representative crystal structures of each of the series. These series generally contain a few inactive compounds, usually not available in the literature, to provide an upper bound to the affinity range. The affinity ranges are typically 3-4 orders of magnitude per series. For our in-house projects, we have had compounds synthesized for biological testing. Affinities were measured by Thermofluor, Octet RED, and isothermal titration calorimetry for the most soluble. This allows the direct comparison of the biological affinities for those compounds, providing a measure of the variance in the experimental affinity. It appears that there can be considerable variance in the absolute value of the affinity, making the prediction of the absolute value ill-defined. However, the relative rankings within the methods are much better, and this fits with the observation that predicting relative ranking is a more tractable problem computationally. For those in-house compounds, we also have measured the following physical properties: logD, logP, thermodynamic solubility, and pK(a). This data set also provides a substantial decoy set for each target consisting of diverse conformations covering the entire active site for all of the 58 CSAR-quality crystal structures. The CSAR data sets (CSAR-NRC HiQ and the 2012 release) provide substantial, publically available, curated data sets for use in parametrizing and validating docking and scoring methods.

Show MeSH
Selection method utilizing recursive partitioning and coupled multipledistribution analysis in JMP31.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3753885&req=5

fig1: Selection method utilizing recursive partitioning and coupled multipledistribution analysis in JMP31.

Mentions: Approximately 50 active compounds and 10 inactive ones per seriesare chosen to provide reasonable coverage of properties and affinities.We often receive much more than 50 compounds per series. Typically,a company has a couple of thousand ligands to choose from. To choosethe representative subset, we use recursive partitioning31 based on the negative log of the affinity versuscalculated physical properties: number of hydrogen-bond donors, hydrogen-bondacceptors, number of rotatable bonds, molecular weight, and topologicalsurface area. We split the full set until the log worth has reachedits set point (JMP31 default) or thereare no more than five compounds in each individual leaf. This allowsus to classify and bin the compounds in a logical fashion using relevantvariables. Using this binning in conjunction with visualization ofa distribution analysis, we make initial selections in the spreadsheetand then tailor those selections to obtain a more even distributionacross the given calculated properties (Figure 1).


CSAR data set release 2012: ligands, affinities, complexes, and docking decoys.

Dunbar JB, Smith RD, Damm-Ganamet KL, Ahmed A, Esposito EX, Delproposto J, Chinnaswamy K, Kang YN, Kubish G, Gestwicki JE, Stuckey JA, Carlson HA - J Chem Inf Model (2013)

Selection method utilizing recursive partitioning and coupled multipledistribution analysis in JMP31.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3753885&req=5

fig1: Selection method utilizing recursive partitioning and coupled multipledistribution analysis in JMP31.
Mentions: Approximately 50 active compounds and 10 inactive ones per seriesare chosen to provide reasonable coverage of properties and affinities.We often receive much more than 50 compounds per series. Typically,a company has a couple of thousand ligands to choose from. To choosethe representative subset, we use recursive partitioning31 based on the negative log of the affinity versuscalculated physical properties: number of hydrogen-bond donors, hydrogen-bondacceptors, number of rotatable bonds, molecular weight, and topologicalsurface area. We split the full set until the log worth has reachedits set point (JMP31 default) or thereare no more than five compounds in each individual leaf. This allowsus to classify and bin the compounds in a logical fashion using relevantvariables. Using this binning in conjunction with visualization ofa distribution analysis, we make initial selections in the spreadsheetand then tailor those selections to obtain a more even distributionacross the given calculated properties (Figure 1).

Bottom Line: This allows the direct comparison of the biological affinities for those compounds, providing a measure of the variance in the experimental affinity.It appears that there can be considerable variance in the absolute value of the affinity, making the prediction of the absolute value ill-defined.For those in-house compounds, we also have measured the following physical properties: logD, logP, thermodynamic solubility, and pK(a).

View Article: PubMed Central - PubMed

Affiliation: Department of Medicinal Chemistry, University of Michigan, 428 Church St., Ann Arbor, Michigan 48109-1065, USA. jbdunbar@umich.edu

ABSTRACT
A major goal in drug design is the improvement of computational methods for docking and scoring. The Community Structure Activity Resource (CSAR) has collected several data sets from industry and added in-house data sets that may be used for this purpose ( www.csardock.org). CSAR has currently obtained data from Abbott, GlaxoSmithKline, and Vertex and is working on obtaining data from several others. Combined with our in-house projects, we are providing a data set consisting of 6 protein targets, 647 compounds with biological affinities, and 82 crystal structures. Multiple congeneric series are available for several targets with a few representative crystal structures of each of the series. These series generally contain a few inactive compounds, usually not available in the literature, to provide an upper bound to the affinity range. The affinity ranges are typically 3-4 orders of magnitude per series. For our in-house projects, we have had compounds synthesized for biological testing. Affinities were measured by Thermofluor, Octet RED, and isothermal titration calorimetry for the most soluble. This allows the direct comparison of the biological affinities for those compounds, providing a measure of the variance in the experimental affinity. It appears that there can be considerable variance in the absolute value of the affinity, making the prediction of the absolute value ill-defined. However, the relative rankings within the methods are much better, and this fits with the observation that predicting relative ranking is a more tractable problem computationally. For those in-house compounds, we also have measured the following physical properties: logD, logP, thermodynamic solubility, and pK(a). This data set also provides a substantial decoy set for each target consisting of diverse conformations covering the entire active site for all of the 58 CSAR-quality crystal structures. The CSAR data sets (CSAR-NRC HiQ and the 2012 release) provide substantial, publically available, curated data sets for use in parametrizing and validating docking and scoring methods.

Show MeSH