Limits...
PubChem structure-activity relationship (SAR) clusters.

Kim S, Han L, Yu B, Hähnke VD, Bolton EE, Bryant SH - J Cheminform (2015)

Bottom Line: The resulting 18 million clusters, named "PubChem SAR clusters", were delivered in such a way that each cluster contains a group of small molecules similar to each other in both structure and bioactivity.Each SAR cluster can be a useful resource in developing a meaningful SAR or enable one to design or expand compound libraries from the cluster.It can also help to predict the potential therapeutic effects and pharmacological actions of less-known compounds from those of well-known compounds (i.e., drugs) in the same cluster.

View Article: PubMed Central - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894 USA.

ABSTRACT

Background: Developing structure-activity relationships (SARs) of molecules is an important approach in facilitating hit exploration in the early stage of drug discovery. Although information on millions of compounds and their bioactivities is freely available to the public, it is very challenging to infer a meaningful and novel SAR from that information.

Results: Research discussed in the present paper employed a bioactivity-centered clustering approach to group 843,845 non-inactive compounds stored in PubChem according to both structural similarity and bioactivity similarity, with the aim of mining bioactivity data in PubChem for useful SAR information. The compounds were clustered in three bioactivity similarity contexts: (1) non-inactive in a given bioassay, (2) non-inactive against a given protein, and (3) non-inactive against proteins involved in a given pathway. In each context, these small molecules were clustered according to their two-dimensional (2-D) and three-dimensional (3-D) structural similarities. The resulting 18 million clusters, named "PubChem SAR clusters", were delivered in such a way that each cluster contains a group of small molecules similar to each other in both structure and bioactivity.

Conclusions: The PubChem SAR clusters, pre-computed using publicly available bioactivity information, make it possible to quickly navigate and narrow down the compounds of interest. Each SAR cluster can be a useful resource in developing a meaningful SAR or enable one to design or expand compound libraries from the cluster. It can also help to predict the potential therapeutic effects and pharmacological actions of less-known compounds from those of well-known compounds (i.e., drugs) in the same cluster.

No MeSH data available.


Related in: MedlinePlus

Cluster overlap between similarity measures. The overlap between clusters from five different similarity measures is quantified with the average  values, where i and j are indices for rows and columns, respectively (see text for the definition).
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4492103&req=5

Fig5: Cluster overlap between similarity measures. The overlap between clusters from five different similarity measures is quantified with the average values, where i and j are indices for rows and columns, respectively (see text for the definition).

Mentions: Figure 5 shows the average values over all AIDs, GIs, and BSIDs. Among the four different 3-D cluster types, the STST-opt clusters showed the least overlapping compounds with the other three 3-D clusters. For example, for the assay-centric clusters, the average values for O(STST-opt, j) and O(i, STST-opt) between STST-opt and the other three 3-D clusters were 71–79%, whereas the average values between the other three 3-D cluster types were 85% or greater. Interestingly, the STST-opt clusters also showed the least overlaps with the 2-D Tanimoto similarity, with O(STST-opt, 2-D) and O(2-D, STST-opt) values of 76 and 69%, respectively, which are lower than any other O(i, 2-D) and O(2-D, j) values between 2-D similarity measures and the others. This may be because, among the four 3-D similarity measures considered, STST-opt is the only one that does not take feature (or functional group) similarity into account. It seems that the other three 3-D similarity measures, to some extent, can take structural information into account that is encoded in molecular fingerprints by using feature atoms that represent six functional group types. However, the STST-opt similarity uses steric shape of the molecule only, and this may be the reason why it produced clusters that least overlapped with those from other similarity methods used.Figure 5


PubChem structure-activity relationship (SAR) clusters.

Kim S, Han L, Yu B, Hähnke VD, Bolton EE, Bryant SH - J Cheminform (2015)

Cluster overlap between similarity measures. The overlap between clusters from five different similarity measures is quantified with the average  values, where i and j are indices for rows and columns, respectively (see text for the definition).
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4492103&req=5

Fig5: Cluster overlap between similarity measures. The overlap between clusters from five different similarity measures is quantified with the average values, where i and j are indices for rows and columns, respectively (see text for the definition).
Mentions: Figure 5 shows the average values over all AIDs, GIs, and BSIDs. Among the four different 3-D cluster types, the STST-opt clusters showed the least overlapping compounds with the other three 3-D clusters. For example, for the assay-centric clusters, the average values for O(STST-opt, j) and O(i, STST-opt) between STST-opt and the other three 3-D clusters were 71–79%, whereas the average values between the other three 3-D cluster types were 85% or greater. Interestingly, the STST-opt clusters also showed the least overlaps with the 2-D Tanimoto similarity, with O(STST-opt, 2-D) and O(2-D, STST-opt) values of 76 and 69%, respectively, which are lower than any other O(i, 2-D) and O(2-D, j) values between 2-D similarity measures and the others. This may be because, among the four 3-D similarity measures considered, STST-opt is the only one that does not take feature (or functional group) similarity into account. It seems that the other three 3-D similarity measures, to some extent, can take structural information into account that is encoded in molecular fingerprints by using feature atoms that represent six functional group types. However, the STST-opt similarity uses steric shape of the molecule only, and this may be the reason why it produced clusters that least overlapped with those from other similarity methods used.Figure 5

Bottom Line: The resulting 18 million clusters, named "PubChem SAR clusters", were delivered in such a way that each cluster contains a group of small molecules similar to each other in both structure and bioactivity.Each SAR cluster can be a useful resource in developing a meaningful SAR or enable one to design or expand compound libraries from the cluster.It can also help to predict the potential therapeutic effects and pharmacological actions of less-known compounds from those of well-known compounds (i.e., drugs) in the same cluster.

View Article: PubMed Central - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894 USA.

ABSTRACT

Background: Developing structure-activity relationships (SARs) of molecules is an important approach in facilitating hit exploration in the early stage of drug discovery. Although information on millions of compounds and their bioactivities is freely available to the public, it is very challenging to infer a meaningful and novel SAR from that information.

Results: Research discussed in the present paper employed a bioactivity-centered clustering approach to group 843,845 non-inactive compounds stored in PubChem according to both structural similarity and bioactivity similarity, with the aim of mining bioactivity data in PubChem for useful SAR information. The compounds were clustered in three bioactivity similarity contexts: (1) non-inactive in a given bioassay, (2) non-inactive against a given protein, and (3) non-inactive against proteins involved in a given pathway. In each context, these small molecules were clustered according to their two-dimensional (2-D) and three-dimensional (3-D) structural similarities. The resulting 18 million clusters, named "PubChem SAR clusters", were delivered in such a way that each cluster contains a group of small molecules similar to each other in both structure and bioactivity.

Conclusions: The PubChem SAR clusters, pre-computed using publicly available bioactivity information, make it possible to quickly navigate and narrow down the compounds of interest. Each SAR cluster can be a useful resource in developing a meaningful SAR or enable one to design or expand compound libraries from the cluster. It can also help to predict the potential therapeutic effects and pharmacological actions of less-known compounds from those of well-known compounds (i.e., drugs) in the same cluster.

No MeSH data available.


Related in: MedlinePlus