Limits...
PubChem structure-activity relationship (SAR) clusters.

Kim S, Han L, Yu B, Hähnke VD, Bolton EE, Bryant SH - J Cheminform (2015)

Bottom Line: The resulting 18 million clusters, named "PubChem SAR clusters", were delivered in such a way that each cluster contains a group of small molecules similar to each other in both structure and bioactivity.Each SAR cluster can be a useful resource in developing a meaningful SAR or enable one to design or expand compound libraries from the cluster.It can also help to predict the potential therapeutic effects and pharmacological actions of less-known compounds from those of well-known compounds (i.e., drugs) in the same cluster.

View Article: PubMed Central - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894 USA.

ABSTRACT

Background: Developing structure-activity relationships (SARs) of molecules is an important approach in facilitating hit exploration in the early stage of drug discovery. Although information on millions of compounds and their bioactivities is freely available to the public, it is very challenging to infer a meaningful and novel SAR from that information.

Results: Research discussed in the present paper employed a bioactivity-centered clustering approach to group 843,845 non-inactive compounds stored in PubChem according to both structural similarity and bioactivity similarity, with the aim of mining bioactivity data in PubChem for useful SAR information. The compounds were clustered in three bioactivity similarity contexts: (1) non-inactive in a given bioassay, (2) non-inactive against a given protein, and (3) non-inactive against proteins involved in a given pathway. In each context, these small molecules were clustered according to their two-dimensional (2-D) and three-dimensional (3-D) structural similarities. The resulting 18 million clusters, named "PubChem SAR clusters", were delivered in such a way that each cluster contains a group of small molecules similar to each other in both structure and bioactivity.

Conclusions: The PubChem SAR clusters, pre-computed using publicly available bioactivity information, make it possible to quickly navigate and narrow down the compounds of interest. Each SAR cluster can be a useful resource in developing a meaningful SAR or enable one to design or expand compound libraries from the cluster. It can also help to predict the potential therapeutic effects and pharmacological actions of less-known compounds from those of well-known compounds (i.e., drugs) in the same cluster.

No MeSH data available.


Related in: MedlinePlus

Collapse of conformer clusters into compound clusters. A compound is represented with a square node and its conformer is represented with a round node of the same color. An edge between two conformer nodes indicates that the distance between them is below the dthresh value used for clustering, and the edge between two compound nodes indicates that at least one conformer pair arising from the two compounds is below the dthresh value. PubChem 3-D SAR clustering algorithm is initially applied to conformers of non-inactive compounds, resulting in conformer clusters (in the left panel). Compound clusters are constructed by replacing the conformers with the respective compounds (in the right panel). As a result, a compound can occur in multiple compound clusters (via its different conformers).
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4492103&req=5

Fig9: Collapse of conformer clusters into compound clusters. A compound is represented with a square node and its conformer is represented with a round node of the same color. An edge between two conformer nodes indicates that the distance between them is below the dthresh value used for clustering, and the edge between two compound nodes indicates that at least one conformer pair arising from the two compounds is below the dthresh value. PubChem 3-D SAR clustering algorithm is initially applied to conformers of non-inactive compounds, resulting in conformer clusters (in the left panel). Compound clusters are constructed by replacing the conformers with the respective compounds (in the right panel). As a result, a compound can occur in multiple compound clusters (via its different conformers).

Mentions: A noticeable observation in Figure 8 is that the total number of nodes for the ComboTCT-opt clusters is 41, which is greater than the number of non-inactive compounds used in the SAR clustering, suggesting that some of the compounds occur more than once. For example, CIDs 3038 and 5284549 occur in both Clusters 19 and 23, making Cluster 23 appear to be a subset of Cluster 19. However, this is not true at the conformer level because the conformers involved in the two clusters are not identical. Note that it is not compounds but their conformers that were clustered during the 3-D SAR clustering. As illustrated in Figure 9, a single compound can occur in different 3-D clusters via different conformers because multiple conformers per compound were used for 3-D clustering. In contrast, a compound can occur only once in 2-D clusters. This explains why there are much more 3-D clusters than 2-D clusters (as observed in Figure 1). In essence, by using up to ten conformers for each compound, the 3-D clustering considers ten times more objects than the 2-D clustering does, resulting in the increased count of 3-D clusters over 2-D clusters.Figure 9


PubChem structure-activity relationship (SAR) clusters.

Kim S, Han L, Yu B, Hähnke VD, Bolton EE, Bryant SH - J Cheminform (2015)

Collapse of conformer clusters into compound clusters. A compound is represented with a square node and its conformer is represented with a round node of the same color. An edge between two conformer nodes indicates that the distance between them is below the dthresh value used for clustering, and the edge between two compound nodes indicates that at least one conformer pair arising from the two compounds is below the dthresh value. PubChem 3-D SAR clustering algorithm is initially applied to conformers of non-inactive compounds, resulting in conformer clusters (in the left panel). Compound clusters are constructed by replacing the conformers with the respective compounds (in the right panel). As a result, a compound can occur in multiple compound clusters (via its different conformers).
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4492103&req=5

Fig9: Collapse of conformer clusters into compound clusters. A compound is represented with a square node and its conformer is represented with a round node of the same color. An edge between two conformer nodes indicates that the distance between them is below the dthresh value used for clustering, and the edge between two compound nodes indicates that at least one conformer pair arising from the two compounds is below the dthresh value. PubChem 3-D SAR clustering algorithm is initially applied to conformers of non-inactive compounds, resulting in conformer clusters (in the left panel). Compound clusters are constructed by replacing the conformers with the respective compounds (in the right panel). As a result, a compound can occur in multiple compound clusters (via its different conformers).
Mentions: A noticeable observation in Figure 8 is that the total number of nodes for the ComboTCT-opt clusters is 41, which is greater than the number of non-inactive compounds used in the SAR clustering, suggesting that some of the compounds occur more than once. For example, CIDs 3038 and 5284549 occur in both Clusters 19 and 23, making Cluster 23 appear to be a subset of Cluster 19. However, this is not true at the conformer level because the conformers involved in the two clusters are not identical. Note that it is not compounds but their conformers that were clustered during the 3-D SAR clustering. As illustrated in Figure 9, a single compound can occur in different 3-D clusters via different conformers because multiple conformers per compound were used for 3-D clustering. In contrast, a compound can occur only once in 2-D clusters. This explains why there are much more 3-D clusters than 2-D clusters (as observed in Figure 1). In essence, by using up to ten conformers for each compound, the 3-D clustering considers ten times more objects than the 2-D clustering does, resulting in the increased count of 3-D clusters over 2-D clusters.Figure 9

Bottom Line: The resulting 18 million clusters, named "PubChem SAR clusters", were delivered in such a way that each cluster contains a group of small molecules similar to each other in both structure and bioactivity.Each SAR cluster can be a useful resource in developing a meaningful SAR or enable one to design or expand compound libraries from the cluster.It can also help to predict the potential therapeutic effects and pharmacological actions of less-known compounds from those of well-known compounds (i.e., drugs) in the same cluster.

View Article: PubMed Central - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894 USA.

ABSTRACT

Background: Developing structure-activity relationships (SARs) of molecules is an important approach in facilitating hit exploration in the early stage of drug discovery. Although information on millions of compounds and their bioactivities is freely available to the public, it is very challenging to infer a meaningful and novel SAR from that information.

Results: Research discussed in the present paper employed a bioactivity-centered clustering approach to group 843,845 non-inactive compounds stored in PubChem according to both structural similarity and bioactivity similarity, with the aim of mining bioactivity data in PubChem for useful SAR information. The compounds were clustered in three bioactivity similarity contexts: (1) non-inactive in a given bioassay, (2) non-inactive against a given protein, and (3) non-inactive against proteins involved in a given pathway. In each context, these small molecules were clustered according to their two-dimensional (2-D) and three-dimensional (3-D) structural similarities. The resulting 18 million clusters, named "PubChem SAR clusters", were delivered in such a way that each cluster contains a group of small molecules similar to each other in both structure and bioactivity.

Conclusions: The PubChem SAR clusters, pre-computed using publicly available bioactivity information, make it possible to quickly navigate and narrow down the compounds of interest. Each SAR cluster can be a useful resource in developing a meaningful SAR or enable one to design or expand compound libraries from the cluster. It can also help to predict the potential therapeutic effects and pharmacological actions of less-known compounds from those of well-known compounds (i.e., drugs) in the same cluster.

No MeSH data available.


Related in: MedlinePlus