Limits...
Stereoselective virtual screening of the ZINC database using atom pair 3D-fingerprints.

Awale M, Jin X, Reymond JL - J Cheminform (2015)

Bottom Line: We recently showed that large databases can be formatted for very fast searching with various 2D-fingerprints using the city-block distance as similarity measure, in particular a 2D-atom pair fingerprint (APfp) and the related category extended atom pair fingerprint (Xfp) which efficiently encode molecular shape and pharmacophores, but do not perceive stereochemistry.Results were also different from LBVS with the parent 2D-fingerprints Xfp or APfp. 3D- and 2D-fingerprints also gave very different results in LBVS of folded molecules where through-space distances between atom pairs are much shorter than topological distances. 3DAPfp and 3DXfp are suitable for stereoselective searches for shape and pharmacophore analogs of query molecules in large databases.Graphical abstractAtom pair fingerprints based on through-space distances (3DAPfp) provide better shape encoding than atom pair fingerprints based on topological distances (APfp) as measured by the recovery of ROCS shape analogs by fp similarity.

View Article: PubMed Central - PubMed

Affiliation: Department of Chemistry and Biochemistry, University of Berne, Freiestrasse 3, 3012 Berne, Switzerland.

ABSTRACT

Background: Tools to explore large compound databases in search for analogs of query molecules provide a strategically important support in drug discovery to help identify available analogs of any given reference or hit compound by ligand based virtual screening (LBVS). We recently showed that large databases can be formatted for very fast searching with various 2D-fingerprints using the city-block distance as similarity measure, in particular a 2D-atom pair fingerprint (APfp) and the related category extended atom pair fingerprint (Xfp) which efficiently encode molecular shape and pharmacophores, but do not perceive stereochemistry. Here we investigated related 3D-atom pair fingerprints to enable rapid stereoselective searches in the ZINC database (23.2 million 3D structures).

Results: Molecular fingerprints counting atom pairs at increasing through-space distance intervals were designed using either all atoms (16-bit 3DAPfp) or different atom categories (80-bit 3DXfp). These 3D-fingerprints retrieved molecular shape and pharmacophore analogs (defined by OpenEye ROCS scoring functions) of 110,000 compounds from the Cambridge Structural Database with equal or better accuracy than the 2D-fingerprints APfp and Xfp, and showed comparable performance in recovering actives from decoys in the DUD database. LBVS by 3DXfp or 3DAPfp similarity was stereoselective and gave very different analogs when starting from different diastereomers of the same chiral drug. Results were also different from LBVS with the parent 2D-fingerprints Xfp or APfp. 3D- and 2D-fingerprints also gave very different results in LBVS of folded molecules where through-space distances between atom pairs are much shorter than topological distances.

Conclusions: 3DAPfp and 3DXfp are suitable for stereoselective searches for shape and pharmacophore analogs of query molecules in large databases. Web-browsers for searching ZINC by 3DAPfp and 3DXfp similarity are accessible at www.gdb.unibe.ch and should provide useful assistance to drug discovery projects. Graphical abstractAtom pair fingerprints based on through-space distances (3DAPfp) provide better shape encoding than atom pair fingerprints based on topological distances (APfp) as measured by the recovery of ROCS shape analogs by fp similarity.

No MeSH data available.


Related in: MedlinePlus

3D-atom pair fingerprint design. A-C. Distance sampling for 3D-atom pair fingerprints illustrated for atom-pair distance of 8.51 Å. A. A gaussian curve is drawn (red) with its maximum centred at atom-pair distance of 8.51 Å and width as 18% of atom-pair distance. The gaussian is then sampled at 16 distance values B1-B16 (blue vertical bars): 1.45, 1.71, 2.02, 2.38, 2.81, 3.32, 3.91, 4.62, 5.45, 6.43, 7.59, 8.96, 10.57, 12.47, 14.71 and 17.36 Å (16 bit values at dn+1 = dn × 1.18) B. Regular Binning: the atom-pair distance of 8.51 Å produces an increment of 1 in the R18 bin covering the range of 8.5-9 Å. C. Bit values B1-B16 for the atom pair at 8.51 Å from the gaussian/exponential sampling principle in A. D. Average bit value and standard deviation (SD) of R3DAPfp and 3DAPfp of all molecules from the Cambridge structural database (CSD, 110 000 molecules) and ZINC (23.2 M molecules).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4352573&req=5

Fig1: 3D-atom pair fingerprint design. A-C. Distance sampling for 3D-atom pair fingerprints illustrated for atom-pair distance of 8.51 Å. A. A gaussian curve is drawn (red) with its maximum centred at atom-pair distance of 8.51 Å and width as 18% of atom-pair distance. The gaussian is then sampled at 16 distance values B1-B16 (blue vertical bars): 1.45, 1.71, 2.02, 2.38, 2.81, 3.32, 3.91, 4.62, 5.45, 6.43, 7.59, 8.96, 10.57, 12.47, 14.71 and 17.36 Å (16 bit values at dn+1 = dn × 1.18) B. Regular Binning: the atom-pair distance of 8.51 Å produces an increment of 1 in the R18 bin covering the range of 8.5-9 Å. C. Bit values B1-B16 for the atom pair at 8.51 Å from the gaussian/exponential sampling principle in A. D. Average bit value and standard deviation (SD) of R3DAPfp and 3DAPfp of all molecules from the Cambridge structural database (CSD, 110 000 molecules) and ZINC (23.2 M molecules).

Mentions: The 3D-fingerprints were designed in direct analogy to our recently reported 2D atom pair fingerprints, with a simple version tailored for shape similarity with all heavy atoms treated equally (3DAPfp), and an atom category extended version (3DXfp) tailored for pharmacophore similarity, considering hydrophobic atoms (Hyb), H-bond donors (HBD), H-bond acceptors (HBA), planar atoms (sp2), and the HBD-HBA cross-pair as categories. In contrast to 2D-fingerprints for which distance bins are automatically defined by the topological distance counted in number of bonds through the shortest path, 3D-fingerprints require a binning principle for the through-space distance to assign atom pairs to distance bins. Following an approach similar to that of Sheridan et al., [29] each through-space atom-pair distance was converted to a gaussian function with its maximum value at the atom pair distance and a width of 18% of the atom pair distance, and the function was sampled at 16 values between 1.45 Å and 17.36 Å, each interval between sampling values being 1.18 times broader than the preceding interval (16-bit 3DAPfp and 80-bit 3DXfp). The atom pair bit value increments were summed, and the sum values normalized to HAC1.5, which reduced sensitivity to molecular size. This gaussian/exponential sampling principle allowed for a certain degree of fuzziness in the shape perception at large distances while reducing the dimensionality of the fingerprint. To test if this concept was useful, two additional 3D-fingerprints were created by simply binning the distance at regular 0.5 Å intervals up to 20 Å and assigning each atom pair to a single bit, normalizing bit values to the heavy atom count (regular binning: 40-bit R3DAPfp and 200-bit R3DXfp). For each of the four fingerprints (3DAPfp, 3DXPfp, R3DAPfp and R3DXfp), the bit values were expressed in percent and rounded to the integer value. The fingerprint design and bit-value profiles of R3DAPfp and 3DAPfp for the reference databases CSD and ZINC are illustrated in Figure 1.Figure 1


Stereoselective virtual screening of the ZINC database using atom pair 3D-fingerprints.

Awale M, Jin X, Reymond JL - J Cheminform (2015)

3D-atom pair fingerprint design. A-C. Distance sampling for 3D-atom pair fingerprints illustrated for atom-pair distance of 8.51 Å. A. A gaussian curve is drawn (red) with its maximum centred at atom-pair distance of 8.51 Å and width as 18% of atom-pair distance. The gaussian is then sampled at 16 distance values B1-B16 (blue vertical bars): 1.45, 1.71, 2.02, 2.38, 2.81, 3.32, 3.91, 4.62, 5.45, 6.43, 7.59, 8.96, 10.57, 12.47, 14.71 and 17.36 Å (16 bit values at dn+1 = dn × 1.18) B. Regular Binning: the atom-pair distance of 8.51 Å produces an increment of 1 in the R18 bin covering the range of 8.5-9 Å. C. Bit values B1-B16 for the atom pair at 8.51 Å from the gaussian/exponential sampling principle in A. D. Average bit value and standard deviation (SD) of R3DAPfp and 3DAPfp of all molecules from the Cambridge structural database (CSD, 110 000 molecules) and ZINC (23.2 M molecules).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4352573&req=5

Fig1: 3D-atom pair fingerprint design. A-C. Distance sampling for 3D-atom pair fingerprints illustrated for atom-pair distance of 8.51 Å. A. A gaussian curve is drawn (red) with its maximum centred at atom-pair distance of 8.51 Å and width as 18% of atom-pair distance. The gaussian is then sampled at 16 distance values B1-B16 (blue vertical bars): 1.45, 1.71, 2.02, 2.38, 2.81, 3.32, 3.91, 4.62, 5.45, 6.43, 7.59, 8.96, 10.57, 12.47, 14.71 and 17.36 Å (16 bit values at dn+1 = dn × 1.18) B. Regular Binning: the atom-pair distance of 8.51 Å produces an increment of 1 in the R18 bin covering the range of 8.5-9 Å. C. Bit values B1-B16 for the atom pair at 8.51 Å from the gaussian/exponential sampling principle in A. D. Average bit value and standard deviation (SD) of R3DAPfp and 3DAPfp of all molecules from the Cambridge structural database (CSD, 110 000 molecules) and ZINC (23.2 M molecules).
Mentions: The 3D-fingerprints were designed in direct analogy to our recently reported 2D atom pair fingerprints, with a simple version tailored for shape similarity with all heavy atoms treated equally (3DAPfp), and an atom category extended version (3DXfp) tailored for pharmacophore similarity, considering hydrophobic atoms (Hyb), H-bond donors (HBD), H-bond acceptors (HBA), planar atoms (sp2), and the HBD-HBA cross-pair as categories. In contrast to 2D-fingerprints for which distance bins are automatically defined by the topological distance counted in number of bonds through the shortest path, 3D-fingerprints require a binning principle for the through-space distance to assign atom pairs to distance bins. Following an approach similar to that of Sheridan et al., [29] each through-space atom-pair distance was converted to a gaussian function with its maximum value at the atom pair distance and a width of 18% of the atom pair distance, and the function was sampled at 16 values between 1.45 Å and 17.36 Å, each interval between sampling values being 1.18 times broader than the preceding interval (16-bit 3DAPfp and 80-bit 3DXfp). The atom pair bit value increments were summed, and the sum values normalized to HAC1.5, which reduced sensitivity to molecular size. This gaussian/exponential sampling principle allowed for a certain degree of fuzziness in the shape perception at large distances while reducing the dimensionality of the fingerprint. To test if this concept was useful, two additional 3D-fingerprints were created by simply binning the distance at regular 0.5 Å intervals up to 20 Å and assigning each atom pair to a single bit, normalizing bit values to the heavy atom count (regular binning: 40-bit R3DAPfp and 200-bit R3DXfp). For each of the four fingerprints (3DAPfp, 3DXPfp, R3DAPfp and R3DXfp), the bit values were expressed in percent and rounded to the integer value. The fingerprint design and bit-value profiles of R3DAPfp and 3DAPfp for the reference databases CSD and ZINC are illustrated in Figure 1.Figure 1

Bottom Line: We recently showed that large databases can be formatted for very fast searching with various 2D-fingerprints using the city-block distance as similarity measure, in particular a 2D-atom pair fingerprint (APfp) and the related category extended atom pair fingerprint (Xfp) which efficiently encode molecular shape and pharmacophores, but do not perceive stereochemistry.Results were also different from LBVS with the parent 2D-fingerprints Xfp or APfp. 3D- and 2D-fingerprints also gave very different results in LBVS of folded molecules where through-space distances between atom pairs are much shorter than topological distances. 3DAPfp and 3DXfp are suitable for stereoselective searches for shape and pharmacophore analogs of query molecules in large databases.Graphical abstractAtom pair fingerprints based on through-space distances (3DAPfp) provide better shape encoding than atom pair fingerprints based on topological distances (APfp) as measured by the recovery of ROCS shape analogs by fp similarity.

View Article: PubMed Central - PubMed

Affiliation: Department of Chemistry and Biochemistry, University of Berne, Freiestrasse 3, 3012 Berne, Switzerland.

ABSTRACT

Background: Tools to explore large compound databases in search for analogs of query molecules provide a strategically important support in drug discovery to help identify available analogs of any given reference or hit compound by ligand based virtual screening (LBVS). We recently showed that large databases can be formatted for very fast searching with various 2D-fingerprints using the city-block distance as similarity measure, in particular a 2D-atom pair fingerprint (APfp) and the related category extended atom pair fingerprint (Xfp) which efficiently encode molecular shape and pharmacophores, but do not perceive stereochemistry. Here we investigated related 3D-atom pair fingerprints to enable rapid stereoselective searches in the ZINC database (23.2 million 3D structures).

Results: Molecular fingerprints counting atom pairs at increasing through-space distance intervals were designed using either all atoms (16-bit 3DAPfp) or different atom categories (80-bit 3DXfp). These 3D-fingerprints retrieved molecular shape and pharmacophore analogs (defined by OpenEye ROCS scoring functions) of 110,000 compounds from the Cambridge Structural Database with equal or better accuracy than the 2D-fingerprints APfp and Xfp, and showed comparable performance in recovering actives from decoys in the DUD database. LBVS by 3DXfp or 3DAPfp similarity was stereoselective and gave very different analogs when starting from different diastereomers of the same chiral drug. Results were also different from LBVS with the parent 2D-fingerprints Xfp or APfp. 3D- and 2D-fingerprints also gave very different results in LBVS of folded molecules where through-space distances between atom pairs are much shorter than topological distances.

Conclusions: 3DAPfp and 3DXfp are suitable for stereoselective searches for shape and pharmacophore analogs of query molecules in large databases. Web-browsers for searching ZINC by 3DAPfp and 3DXfp similarity are accessible at www.gdb.unibe.ch and should provide useful assistance to drug discovery projects. Graphical abstractAtom pair fingerprints based on through-space distances (3DAPfp) provide better shape encoding than atom pair fingerprints based on topological distances (APfp) as measured by the recovery of ROCS shape analogs by fp similarity.

No MeSH data available.


Related in: MedlinePlus