Limits...
Stereoselective virtual screening of the ZINC database using atom pair 3D-fingerprints.

Awale M, Jin X, Reymond JL - J Cheminform (2015)

Bottom Line: We recently showed that large databases can be formatted for very fast searching with various 2D-fingerprints using the city-block distance as similarity measure, in particular a 2D-atom pair fingerprint (APfp) and the related category extended atom pair fingerprint (Xfp) which efficiently encode molecular shape and pharmacophores, but do not perceive stereochemistry.Results were also different from LBVS with the parent 2D-fingerprints Xfp or APfp. 3D- and 2D-fingerprints also gave very different results in LBVS of folded molecules where through-space distances between atom pairs are much shorter than topological distances. 3DAPfp and 3DXfp are suitable for stereoselective searches for shape and pharmacophore analogs of query molecules in large databases.Graphical abstractAtom pair fingerprints based on through-space distances (3DAPfp) provide better shape encoding than atom pair fingerprints based on topological distances (APfp) as measured by the recovery of ROCS shape analogs by fp similarity.

View Article: PubMed Central - PubMed

Affiliation: Department of Chemistry and Biochemistry, University of Berne, Freiestrasse 3, 3012 Berne, Switzerland.

ABSTRACT

Background: Tools to explore large compound databases in search for analogs of query molecules provide a strategically important support in drug discovery to help identify available analogs of any given reference or hit compound by ligand based virtual screening (LBVS). We recently showed that large databases can be formatted for very fast searching with various 2D-fingerprints using the city-block distance as similarity measure, in particular a 2D-atom pair fingerprint (APfp) and the related category extended atom pair fingerprint (Xfp) which efficiently encode molecular shape and pharmacophores, but do not perceive stereochemistry. Here we investigated related 3D-atom pair fingerprints to enable rapid stereoselective searches in the ZINC database (23.2 million 3D structures).

Results: Molecular fingerprints counting atom pairs at increasing through-space distance intervals were designed using either all atoms (16-bit 3DAPfp) or different atom categories (80-bit 3DXfp). These 3D-fingerprints retrieved molecular shape and pharmacophore analogs (defined by OpenEye ROCS scoring functions) of 110,000 compounds from the Cambridge Structural Database with equal or better accuracy than the 2D-fingerprints APfp and Xfp, and showed comparable performance in recovering actives from decoys in the DUD database. LBVS by 3DXfp or 3DAPfp similarity was stereoselective and gave very different analogs when starting from different diastereomers of the same chiral drug. Results were also different from LBVS with the parent 2D-fingerprints Xfp or APfp. 3D- and 2D-fingerprints also gave very different results in LBVS of folded molecules where through-space distances between atom pairs are much shorter than topological distances.

Conclusions: 3DAPfp and 3DXfp are suitable for stereoselective searches for shape and pharmacophore analogs of query molecules in large databases. Web-browsers for searching ZINC by 3DAPfp and 3DXfp similarity are accessible at www.gdb.unibe.ch and should provide useful assistance to drug discovery projects. Graphical abstractAtom pair fingerprints based on through-space distances (3DAPfp) provide better shape encoding than atom pair fingerprints based on topological distances (APfp) as measured by the recovery of ROCS shape analogs by fp similarity.

No MeSH data available.


Related in: MedlinePlus

Recovery statistics of 100 closest analogs of CSD molecules according to ROCS Shape Tanimoto (A), Color Tanimoto (B) and ComboScore (C), by LBVS using various fingerprints, for each of the 110,000 molecules in CSD from their size-constrained subsets (all CSD molecules within HAC = query ± 2). For each of the three cases (A-C), the frequency histogram of AUC values for various fingerprints is shown on left, and the average AUC value as a function of position in the shape triangle for various fingerprints is shown on right. The shape triangle results from plotting the normalized moment of inertia of molecules and distinguishes rod-like, disc-like and sphere-like shapes. Continuous color scale: AUC ≤ 50%: blue, 58%: cyan, 66%: green, 75%: yellow, 80%: red, ≥ 90%: magenta. See also Additional file 1: Figure S1 and S2 in the SI for data showing recovery statistics for different variants of 3DAPfp, 3DXfp, R3DAPfp and R3DXfp.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4352573&req=5

Fig2: Recovery statistics of 100 closest analogs of CSD molecules according to ROCS Shape Tanimoto (A), Color Tanimoto (B) and ComboScore (C), by LBVS using various fingerprints, for each of the 110,000 molecules in CSD from their size-constrained subsets (all CSD molecules within HAC = query ± 2). For each of the three cases (A-C), the frequency histogram of AUC values for various fingerprints is shown on left, and the average AUC value as a function of position in the shape triangle for various fingerprints is shown on right. The shape triangle results from plotting the normalized moment of inertia of molecules and distinguishes rod-like, disc-like and sphere-like shapes. Continuous color scale: AUC ≤ 50%: blue, 58%: cyan, 66%: green, 75%: yellow, 80%: red, ≥ 90%: magenta. See also Additional file 1: Figure S1 and S2 in the SI for data showing recovery statistics for different variants of 3DAPfp, 3DXfp, R3DAPfp and R3DXfp.

Mentions: LBVS for 3D-shape and pharmacophore analogs using the various fingerprints was tested for 110,000 organic molecules up to 50 atoms from the Cambridge Structural Database CSD, which reports experimentally determined 3D coordinates covering a broad range of molecular shapes as measured by the normalized principal moment of inertia (nPMI) triangle, [17] including significant coverage of disk-like and spherical shapes. For each of the 110,000 CSD molecules, three series of “actives” were defined as the 100 closest shape, pharmacophore, or shape + pharmacophore analogs, which were the 100 highest scoring CSD compounds according to one of the following three scoring functions: ROCS (Rapid Overlay of Chemical Structures) shape Tanimoto (3D-shape), ROCS Color Tanimoto (3D-pharmacophore), and ROCS Comboscore (combined 3D-shape and 3D-pharmacophore) [18,38]. The receiver operator characteristics (ROC) curves were then computed for each of the 110,000 CSD compounds for retrieving each for the three series of 100 “actives” (3D-shape and pharmacophore analogs) from a size-constrained subset of CSD (containing all molecules of size HAC ± 2) by LBVS using each of the different fingerprints (Figure 2).Figure 2


Stereoselective virtual screening of the ZINC database using atom pair 3D-fingerprints.

Awale M, Jin X, Reymond JL - J Cheminform (2015)

Recovery statistics of 100 closest analogs of CSD molecules according to ROCS Shape Tanimoto (A), Color Tanimoto (B) and ComboScore (C), by LBVS using various fingerprints, for each of the 110,000 molecules in CSD from their size-constrained subsets (all CSD molecules within HAC = query ± 2). For each of the three cases (A-C), the frequency histogram of AUC values for various fingerprints is shown on left, and the average AUC value as a function of position in the shape triangle for various fingerprints is shown on right. The shape triangle results from plotting the normalized moment of inertia of molecules and distinguishes rod-like, disc-like and sphere-like shapes. Continuous color scale: AUC ≤ 50%: blue, 58%: cyan, 66%: green, 75%: yellow, 80%: red, ≥ 90%: magenta. See also Additional file 1: Figure S1 and S2 in the SI for data showing recovery statistics for different variants of 3DAPfp, 3DXfp, R3DAPfp and R3DXfp.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4352573&req=5

Fig2: Recovery statistics of 100 closest analogs of CSD molecules according to ROCS Shape Tanimoto (A), Color Tanimoto (B) and ComboScore (C), by LBVS using various fingerprints, for each of the 110,000 molecules in CSD from their size-constrained subsets (all CSD molecules within HAC = query ± 2). For each of the three cases (A-C), the frequency histogram of AUC values for various fingerprints is shown on left, and the average AUC value as a function of position in the shape triangle for various fingerprints is shown on right. The shape triangle results from plotting the normalized moment of inertia of molecules and distinguishes rod-like, disc-like and sphere-like shapes. Continuous color scale: AUC ≤ 50%: blue, 58%: cyan, 66%: green, 75%: yellow, 80%: red, ≥ 90%: magenta. See also Additional file 1: Figure S1 and S2 in the SI for data showing recovery statistics for different variants of 3DAPfp, 3DXfp, R3DAPfp and R3DXfp.
Mentions: LBVS for 3D-shape and pharmacophore analogs using the various fingerprints was tested for 110,000 organic molecules up to 50 atoms from the Cambridge Structural Database CSD, which reports experimentally determined 3D coordinates covering a broad range of molecular shapes as measured by the normalized principal moment of inertia (nPMI) triangle, [17] including significant coverage of disk-like and spherical shapes. For each of the 110,000 CSD molecules, three series of “actives” were defined as the 100 closest shape, pharmacophore, or shape + pharmacophore analogs, which were the 100 highest scoring CSD compounds according to one of the following three scoring functions: ROCS (Rapid Overlay of Chemical Structures) shape Tanimoto (3D-shape), ROCS Color Tanimoto (3D-pharmacophore), and ROCS Comboscore (combined 3D-shape and 3D-pharmacophore) [18,38]. The receiver operator characteristics (ROC) curves were then computed for each of the 110,000 CSD compounds for retrieving each for the three series of 100 “actives” (3D-shape and pharmacophore analogs) from a size-constrained subset of CSD (containing all molecules of size HAC ± 2) by LBVS using each of the different fingerprints (Figure 2).Figure 2

Bottom Line: We recently showed that large databases can be formatted for very fast searching with various 2D-fingerprints using the city-block distance as similarity measure, in particular a 2D-atom pair fingerprint (APfp) and the related category extended atom pair fingerprint (Xfp) which efficiently encode molecular shape and pharmacophores, but do not perceive stereochemistry.Results were also different from LBVS with the parent 2D-fingerprints Xfp or APfp. 3D- and 2D-fingerprints also gave very different results in LBVS of folded molecules where through-space distances between atom pairs are much shorter than topological distances. 3DAPfp and 3DXfp are suitable for stereoselective searches for shape and pharmacophore analogs of query molecules in large databases.Graphical abstractAtom pair fingerprints based on through-space distances (3DAPfp) provide better shape encoding than atom pair fingerprints based on topological distances (APfp) as measured by the recovery of ROCS shape analogs by fp similarity.

View Article: PubMed Central - PubMed

Affiliation: Department of Chemistry and Biochemistry, University of Berne, Freiestrasse 3, 3012 Berne, Switzerland.

ABSTRACT

Background: Tools to explore large compound databases in search for analogs of query molecules provide a strategically important support in drug discovery to help identify available analogs of any given reference or hit compound by ligand based virtual screening (LBVS). We recently showed that large databases can be formatted for very fast searching with various 2D-fingerprints using the city-block distance as similarity measure, in particular a 2D-atom pair fingerprint (APfp) and the related category extended atom pair fingerprint (Xfp) which efficiently encode molecular shape and pharmacophores, but do not perceive stereochemistry. Here we investigated related 3D-atom pair fingerprints to enable rapid stereoselective searches in the ZINC database (23.2 million 3D structures).

Results: Molecular fingerprints counting atom pairs at increasing through-space distance intervals were designed using either all atoms (16-bit 3DAPfp) or different atom categories (80-bit 3DXfp). These 3D-fingerprints retrieved molecular shape and pharmacophore analogs (defined by OpenEye ROCS scoring functions) of 110,000 compounds from the Cambridge Structural Database with equal or better accuracy than the 2D-fingerprints APfp and Xfp, and showed comparable performance in recovering actives from decoys in the DUD database. LBVS by 3DXfp or 3DAPfp similarity was stereoselective and gave very different analogs when starting from different diastereomers of the same chiral drug. Results were also different from LBVS with the parent 2D-fingerprints Xfp or APfp. 3D- and 2D-fingerprints also gave very different results in LBVS of folded molecules where through-space distances between atom pairs are much shorter than topological distances.

Conclusions: 3DAPfp and 3DXfp are suitable for stereoselective searches for shape and pharmacophore analogs of query molecules in large databases. Web-browsers for searching ZINC by 3DAPfp and 3DXfp similarity are accessible at www.gdb.unibe.ch and should provide useful assistance to drug discovery projects. Graphical abstractAtom pair fingerprints based on through-space distances (3DAPfp) provide better shape encoding than atom pair fingerprints based on topological distances (APfp) as measured by the recovery of ROCS shape analogs by fp similarity.

No MeSH data available.


Related in: MedlinePlus