Limits...
Stereoselective virtual screening of the ZINC database using atom pair 3D-fingerprints.

Awale M, Jin X, Reymond JL - J Cheminform (2015)

Bottom Line: We recently showed that large databases can be formatted for very fast searching with various 2D-fingerprints using the city-block distance as similarity measure, in particular a 2D-atom pair fingerprint (APfp) and the related category extended atom pair fingerprint (Xfp) which efficiently encode molecular shape and pharmacophores, but do not perceive stereochemistry.Results were also different from LBVS with the parent 2D-fingerprints Xfp or APfp. 3D- and 2D-fingerprints also gave very different results in LBVS of folded molecules where through-space distances between atom pairs are much shorter than topological distances. 3DAPfp and 3DXfp are suitable for stereoselective searches for shape and pharmacophore analogs of query molecules in large databases.Graphical abstractAtom pair fingerprints based on through-space distances (3DAPfp) provide better shape encoding than atom pair fingerprints based on topological distances (APfp) as measured by the recovery of ROCS shape analogs by fp similarity.

View Article: PubMed Central - PubMed

Affiliation: Department of Chemistry and Biochemistry, University of Berne, Freiestrasse 3, 3012 Berne, Switzerland.

ABSTRACT

Background: Tools to explore large compound databases in search for analogs of query molecules provide a strategically important support in drug discovery to help identify available analogs of any given reference or hit compound by ligand based virtual screening (LBVS). We recently showed that large databases can be formatted for very fast searching with various 2D-fingerprints using the city-block distance as similarity measure, in particular a 2D-atom pair fingerprint (APfp) and the related category extended atom pair fingerprint (Xfp) which efficiently encode molecular shape and pharmacophores, but do not perceive stereochemistry. Here we investigated related 3D-atom pair fingerprints to enable rapid stereoselective searches in the ZINC database (23.2 million 3D structures).

Results: Molecular fingerprints counting atom pairs at increasing through-space distance intervals were designed using either all atoms (16-bit 3DAPfp) or different atom categories (80-bit 3DXfp). These 3D-fingerprints retrieved molecular shape and pharmacophore analogs (defined by OpenEye ROCS scoring functions) of 110,000 compounds from the Cambridge Structural Database with equal or better accuracy than the 2D-fingerprints APfp and Xfp, and showed comparable performance in recovering actives from decoys in the DUD database. LBVS by 3DXfp or 3DAPfp similarity was stereoselective and gave very different analogs when starting from different diastereomers of the same chiral drug. Results were also different from LBVS with the parent 2D-fingerprints Xfp or APfp. 3D- and 2D-fingerprints also gave very different results in LBVS of folded molecules where through-space distances between atom pairs are much shorter than topological distances.

Conclusions: 3DAPfp and 3DXfp are suitable for stereoselective searches for shape and pharmacophore analogs of query molecules in large databases. Web-browsers for searching ZINC by 3DAPfp and 3DXfp similarity are accessible at www.gdb.unibe.ch and should provide useful assistance to drug discovery projects. Graphical abstractAtom pair fingerprints based on through-space distances (3DAPfp) provide better shape encoding than atom pair fingerprints based on topological distances (APfp) as measured by the recovery of ROCS shape analogs by fp similarity.

No MeSH data available.


Related in: MedlinePlus

Recovery of DUD actives using various fingerprints. (A) Average AUC values and (C) enrichment factors at 5% (EF5%) for recovery of 40 sets of actives in directory useful decoys (DUD) from the corresponding decoys set by various fingerprints, using CBD fingerprint (violet bars) and Tfingerprint (grey bars) as scoring functions. (B) AUC values and (D) EF0.1% values for recovery of DUD actives from the entire ZINC database. (E) Occupancy heat map of the molecular shape triangle by DUD actives and decoys (128,352 cpds, blue ≤ 2 cpd/pixel to magenta ≥ 150 cpds/pixel) and (F) by the entire ZINC database (23.2 M cpds, blue ≤ 50 cpd/pixel to magenta ≥ 10000 cpds/pixel). See Additional file 1: Table S1-S8 for detailed AUC and EF values and Additional file 1: Figure S4-S7 for ROC curves.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4352573&req=5

Fig4: Recovery of DUD actives using various fingerprints. (A) Average AUC values and (C) enrichment factors at 5% (EF5%) for recovery of 40 sets of actives in directory useful decoys (DUD) from the corresponding decoys set by various fingerprints, using CBD fingerprint (violet bars) and Tfingerprint (grey bars) as scoring functions. (B) AUC values and (D) EF0.1% values for recovery of DUD actives from the entire ZINC database. (E) Occupancy heat map of the molecular shape triangle by DUD actives and decoys (128,352 cpds, blue ≤ 2 cpd/pixel to magenta ≥ 150 cpds/pixel) and (F) by the entire ZINC database (23.2 M cpds, blue ≤ 50 cpd/pixel to magenta ≥ 10000 cpds/pixel). See Additional file 1: Table S1-S8 for detailed AUC and EF values and Additional file 1: Figure S4-S7 for ROC curves.

Mentions: The recovery of DUD actives from decoys and from the entire ZINC database was investigated as a second test for fingerprint performance [40-44]. For each DUD active set the molecule closest to all other actives in the set in the corresponding fingerprint space was used as reference molecule for the recovery study. LBVS for recovering the other actives from this reference molecule gave comparable results using either the city-block distance or the Tanimoto coefficient as similarity measures (Figure 4A-D and Additional file 1: Figures S4-S7 and Tables S1-S8). 3DXfp, R3DXfp and Xfp stood out as the fingerprints showing the highest average AUC values (~80%) and enrichment factors at 5% coverage (first 1000–2000 cpds, EF5% = 8–10) for the recovery of actives from the corresponding decoys. The other fingerprints performed significantly lower (AUC ~ 60–70 %, EF5% ~ 2–8). The recovery of DUD actives from the entire ZINC database was quite good with all fingerprints (average AUC ~ 80–90%) except USR and PMIfp (average AUC ~ 75%), however enrichment factors at 0.1% database coverage (first 23,200 cpds) were higher for pharmacophore fingerprints (3DXfp, R3DXfp, Xfp,USRCAT) than for shape only fingerprints.Figure 4


Stereoselective virtual screening of the ZINC database using atom pair 3D-fingerprints.

Awale M, Jin X, Reymond JL - J Cheminform (2015)

Recovery of DUD actives using various fingerprints. (A) Average AUC values and (C) enrichment factors at 5% (EF5%) for recovery of 40 sets of actives in directory useful decoys (DUD) from the corresponding decoys set by various fingerprints, using CBD fingerprint (violet bars) and Tfingerprint (grey bars) as scoring functions. (B) AUC values and (D) EF0.1% values for recovery of DUD actives from the entire ZINC database. (E) Occupancy heat map of the molecular shape triangle by DUD actives and decoys (128,352 cpds, blue ≤ 2 cpd/pixel to magenta ≥ 150 cpds/pixel) and (F) by the entire ZINC database (23.2 M cpds, blue ≤ 50 cpd/pixel to magenta ≥ 10000 cpds/pixel). See Additional file 1: Table S1-S8 for detailed AUC and EF values and Additional file 1: Figure S4-S7 for ROC curves.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4352573&req=5

Fig4: Recovery of DUD actives using various fingerprints. (A) Average AUC values and (C) enrichment factors at 5% (EF5%) for recovery of 40 sets of actives in directory useful decoys (DUD) from the corresponding decoys set by various fingerprints, using CBD fingerprint (violet bars) and Tfingerprint (grey bars) as scoring functions. (B) AUC values and (D) EF0.1% values for recovery of DUD actives from the entire ZINC database. (E) Occupancy heat map of the molecular shape triangle by DUD actives and decoys (128,352 cpds, blue ≤ 2 cpd/pixel to magenta ≥ 150 cpds/pixel) and (F) by the entire ZINC database (23.2 M cpds, blue ≤ 50 cpd/pixel to magenta ≥ 10000 cpds/pixel). See Additional file 1: Table S1-S8 for detailed AUC and EF values and Additional file 1: Figure S4-S7 for ROC curves.
Mentions: The recovery of DUD actives from decoys and from the entire ZINC database was investigated as a second test for fingerprint performance [40-44]. For each DUD active set the molecule closest to all other actives in the set in the corresponding fingerprint space was used as reference molecule for the recovery study. LBVS for recovering the other actives from this reference molecule gave comparable results using either the city-block distance or the Tanimoto coefficient as similarity measures (Figure 4A-D and Additional file 1: Figures S4-S7 and Tables S1-S8). 3DXfp, R3DXfp and Xfp stood out as the fingerprints showing the highest average AUC values (~80%) and enrichment factors at 5% coverage (first 1000–2000 cpds, EF5% = 8–10) for the recovery of actives from the corresponding decoys. The other fingerprints performed significantly lower (AUC ~ 60–70 %, EF5% ~ 2–8). The recovery of DUD actives from the entire ZINC database was quite good with all fingerprints (average AUC ~ 80–90%) except USR and PMIfp (average AUC ~ 75%), however enrichment factors at 0.1% database coverage (first 23,200 cpds) were higher for pharmacophore fingerprints (3DXfp, R3DXfp, Xfp,USRCAT) than for shape only fingerprints.Figure 4

Bottom Line: We recently showed that large databases can be formatted for very fast searching with various 2D-fingerprints using the city-block distance as similarity measure, in particular a 2D-atom pair fingerprint (APfp) and the related category extended atom pair fingerprint (Xfp) which efficiently encode molecular shape and pharmacophores, but do not perceive stereochemistry.Results were also different from LBVS with the parent 2D-fingerprints Xfp or APfp. 3D- and 2D-fingerprints also gave very different results in LBVS of folded molecules where through-space distances between atom pairs are much shorter than topological distances. 3DAPfp and 3DXfp are suitable for stereoselective searches for shape and pharmacophore analogs of query molecules in large databases.Graphical abstractAtom pair fingerprints based on through-space distances (3DAPfp) provide better shape encoding than atom pair fingerprints based on topological distances (APfp) as measured by the recovery of ROCS shape analogs by fp similarity.

View Article: PubMed Central - PubMed

Affiliation: Department of Chemistry and Biochemistry, University of Berne, Freiestrasse 3, 3012 Berne, Switzerland.

ABSTRACT

Background: Tools to explore large compound databases in search for analogs of query molecules provide a strategically important support in drug discovery to help identify available analogs of any given reference or hit compound by ligand based virtual screening (LBVS). We recently showed that large databases can be formatted for very fast searching with various 2D-fingerprints using the city-block distance as similarity measure, in particular a 2D-atom pair fingerprint (APfp) and the related category extended atom pair fingerprint (Xfp) which efficiently encode molecular shape and pharmacophores, but do not perceive stereochemistry. Here we investigated related 3D-atom pair fingerprints to enable rapid stereoselective searches in the ZINC database (23.2 million 3D structures).

Results: Molecular fingerprints counting atom pairs at increasing through-space distance intervals were designed using either all atoms (16-bit 3DAPfp) or different atom categories (80-bit 3DXfp). These 3D-fingerprints retrieved molecular shape and pharmacophore analogs (defined by OpenEye ROCS scoring functions) of 110,000 compounds from the Cambridge Structural Database with equal or better accuracy than the 2D-fingerprints APfp and Xfp, and showed comparable performance in recovering actives from decoys in the DUD database. LBVS by 3DXfp or 3DAPfp similarity was stereoselective and gave very different analogs when starting from different diastereomers of the same chiral drug. Results were also different from LBVS with the parent 2D-fingerprints Xfp or APfp. 3D- and 2D-fingerprints also gave very different results in LBVS of folded molecules where through-space distances between atom pairs are much shorter than topological distances.

Conclusions: 3DAPfp and 3DXfp are suitable for stereoselective searches for shape and pharmacophore analogs of query molecules in large databases. Web-browsers for searching ZINC by 3DAPfp and 3DXfp similarity are accessible at www.gdb.unibe.ch and should provide useful assistance to drug discovery projects. Graphical abstractAtom pair fingerprints based on through-space distances (3DAPfp) provide better shape encoding than atom pair fingerprints based on topological distances (APfp) as measured by the recovery of ROCS shape analogs by fp similarity.

No MeSH data available.


Related in: MedlinePlus