Limits...
PubChem structure-activity relationship (SAR) clusters.

Kim S, Han L, Yu B, Hähnke VD, Bolton EE, Bryant SH - J Cheminform (2015)

Bottom Line: The resulting 18 million clusters, named "PubChem SAR clusters", were delivered in such a way that each cluster contains a group of small molecules similar to each other in both structure and bioactivity.Each SAR cluster can be a useful resource in developing a meaningful SAR or enable one to design or expand compound libraries from the cluster.It can also help to predict the potential therapeutic effects and pharmacological actions of less-known compounds from those of well-known compounds (i.e., drugs) in the same cluster.

View Article: PubMed Central - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894 USA.

ABSTRACT

Background: Developing structure-activity relationships (SARs) of molecules is an important approach in facilitating hit exploration in the early stage of drug discovery. Although information on millions of compounds and their bioactivities is freely available to the public, it is very challenging to infer a meaningful and novel SAR from that information.

Results: Research discussed in the present paper employed a bioactivity-centered clustering approach to group 843,845 non-inactive compounds stored in PubChem according to both structural similarity and bioactivity similarity, with the aim of mining bioactivity data in PubChem for useful SAR information. The compounds were clustered in three bioactivity similarity contexts: (1) non-inactive in a given bioassay, (2) non-inactive against a given protein, and (3) non-inactive against proteins involved in a given pathway. In each context, these small molecules were clustered according to their two-dimensional (2-D) and three-dimensional (3-D) structural similarities. The resulting 18 million clusters, named "PubChem SAR clusters", were delivered in such a way that each cluster contains a group of small molecules similar to each other in both structure and bioactivity.

Conclusions: The PubChem SAR clusters, pre-computed using publicly available bioactivity information, make it possible to quickly navigate and narrow down the compounds of interest. Each SAR cluster can be a useful resource in developing a meaningful SAR or enable one to design or expand compound libraries from the cluster. It can also help to predict the potential therapeutic effects and pharmacological actions of less-known compounds from those of well-known compounds (i.e., drugs) in the same cluster.

No MeSH data available.


Related in: MedlinePlus

CTCT-opt, ComboTCT-opt, and 2-D clusters for BSID 545294. The nodes are noninactive compounds in assays involved in BSID545294. The node colors represent the original literature from which the biological activities of the compounds were extracted (green for PMID 17346963, cyan for PMID 18707087, purple for PMID 21309593, and red for PMID 21591606). The node labels are omitted for brevity, but information on cluster members can be found in Additional file 3.
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4492103&req=5

Fig11: CTCT-opt, ComboTCT-opt, and 2-D clusters for BSID 545294. The nodes are noninactive compounds in assays involved in BSID545294. The node colors represent the original literature from which the biological activities of the compounds were extracted (green for PMID 17346963, cyan for PMID 18707087, purple for PMID 21309593, and red for PMID 21591606). The node labels are omitted for brevity, but information on cluster members can be found in Additional file 3.

Mentions: The PubChem SAR clusters for BSID 545294 are provided in Additional file 3. The corresponding CTCT-opt, ComboTCT-opt, and 2-D clusters are displayed in Figure 11. For comparison purposes, the 2-D dendrogram for the 72 compounds contained in the 2-D clusters is displayed in Additional file 4: Figure S3. Most noticeable is that compounds from the same publication tend to be clustered together, except for those from PMIDs 21309593 and 21591606. Although the compounds from these two publications target different proteins in the visual cycle (rhodopsin for PMID 21309593 and RBP4 for PMID 21591606), the natural ligands of the two proteins (i.e., 11-cis-retinal for rhodopsin and all-trans-retinol for RBP4) are structurally very similar to each other: they differ by the configuration of one of their stereocenters (trans- vs. cis-configurations) and the functional group at the end of their carbon chain (hydroxyl vs. aldehyde groups). Structural similarity to these natural ligands was the basis for selection of the compound sets in the two publications.Figure 11


PubChem structure-activity relationship (SAR) clusters.

Kim S, Han L, Yu B, Hähnke VD, Bolton EE, Bryant SH - J Cheminform (2015)

CTCT-opt, ComboTCT-opt, and 2-D clusters for BSID 545294. The nodes are noninactive compounds in assays involved in BSID545294. The node colors represent the original literature from which the biological activities of the compounds were extracted (green for PMID 17346963, cyan for PMID 18707087, purple for PMID 21309593, and red for PMID 21591606). The node labels are omitted for brevity, but information on cluster members can be found in Additional file 3.
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4492103&req=5

Fig11: CTCT-opt, ComboTCT-opt, and 2-D clusters for BSID 545294. The nodes are noninactive compounds in assays involved in BSID545294. The node colors represent the original literature from which the biological activities of the compounds were extracted (green for PMID 17346963, cyan for PMID 18707087, purple for PMID 21309593, and red for PMID 21591606). The node labels are omitted for brevity, but information on cluster members can be found in Additional file 3.
Mentions: The PubChem SAR clusters for BSID 545294 are provided in Additional file 3. The corresponding CTCT-opt, ComboTCT-opt, and 2-D clusters are displayed in Figure 11. For comparison purposes, the 2-D dendrogram for the 72 compounds contained in the 2-D clusters is displayed in Additional file 4: Figure S3. Most noticeable is that compounds from the same publication tend to be clustered together, except for those from PMIDs 21309593 and 21591606. Although the compounds from these two publications target different proteins in the visual cycle (rhodopsin for PMID 21309593 and RBP4 for PMID 21591606), the natural ligands of the two proteins (i.e., 11-cis-retinal for rhodopsin and all-trans-retinol for RBP4) are structurally very similar to each other: they differ by the configuration of one of their stereocenters (trans- vs. cis-configurations) and the functional group at the end of their carbon chain (hydroxyl vs. aldehyde groups). Structural similarity to these natural ligands was the basis for selection of the compound sets in the two publications.Figure 11

Bottom Line: The resulting 18 million clusters, named "PubChem SAR clusters", were delivered in such a way that each cluster contains a group of small molecules similar to each other in both structure and bioactivity.Each SAR cluster can be a useful resource in developing a meaningful SAR or enable one to design or expand compound libraries from the cluster.It can also help to predict the potential therapeutic effects and pharmacological actions of less-known compounds from those of well-known compounds (i.e., drugs) in the same cluster.

View Article: PubMed Central - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894 USA.

ABSTRACT

Background: Developing structure-activity relationships (SARs) of molecules is an important approach in facilitating hit exploration in the early stage of drug discovery. Although information on millions of compounds and their bioactivities is freely available to the public, it is very challenging to infer a meaningful and novel SAR from that information.

Results: Research discussed in the present paper employed a bioactivity-centered clustering approach to group 843,845 non-inactive compounds stored in PubChem according to both structural similarity and bioactivity similarity, with the aim of mining bioactivity data in PubChem for useful SAR information. The compounds were clustered in three bioactivity similarity contexts: (1) non-inactive in a given bioassay, (2) non-inactive against a given protein, and (3) non-inactive against proteins involved in a given pathway. In each context, these small molecules were clustered according to their two-dimensional (2-D) and three-dimensional (3-D) structural similarities. The resulting 18 million clusters, named "PubChem SAR clusters", were delivered in such a way that each cluster contains a group of small molecules similar to each other in both structure and bioactivity.

Conclusions: The PubChem SAR clusters, pre-computed using publicly available bioactivity information, make it possible to quickly navigate and narrow down the compounds of interest. Each SAR cluster can be a useful resource in developing a meaningful SAR or enable one to design or expand compound libraries from the cluster. It can also help to predict the potential therapeutic effects and pharmacological actions of less-known compounds from those of well-known compounds (i.e., drugs) in the same cluster.

No MeSH data available.


Related in: MedlinePlus