Limits...
Accurate and efficient target prediction using a potency-sensitive influence-relevance voter.

Lusci A, Browning M, Fooshee D, Swamidass J, Baldi P - J Cheminform (2015)

Bottom Line: We present two improvements over current practice.Second, we demonstrate that random inactive molecules added during training can boost the accuracy of several algorithms in realistic target-prediction experiments.Our potency-sensitive version of the IRV (PS-IRV) obtains the best results on large test sets in most of the experiments.

View Article: PubMed Central - PubMed

Affiliation: School of Information and Computer Sciences, University of California, Irvine, Irvine, USA.

ABSTRACT

Background: A number of algorithms have been proposed to predict the biological targets of diverse molecules. Some are structure-based, but the most common are ligand-based and use chemical fingerprints and the notion of chemical similarity. These methods tend to be computationally faster than others, making them particularly attractive tools as the amount of available data grows.

Results: Using a ChEMBL-derived database covering 490,760 molecule-protein interactions and 3236 protein targets, we conduct a large-scale assessment of the performance of several target-prediction algorithms at predicting drug-target activity. We assess algorithm performance using three validation procedures: standard tenfold cross-validation, tenfold cross-validation in a simulated screen that includes random inactive molecules, and validation on an external test set composed of molecules not present in our database.

Conclusions: We present two improvements over current practice. First, using a modified version of the influence-relevance voter (IRV), we show that using molecule potency data can improve target prediction. Second, we demonstrate that random inactive molecules added during training can boost the accuracy of several algorithms in realistic target-prediction experiments. Our potency-sensitive version of the IRV (PS-IRV) obtains the best results on large test sets in most of the experiments. Models and software are publicly accessible through the chemoinformatics portal at http://chemdb.ics.uci.edu/.

No MeSH data available.


Related in: MedlinePlus

Simulated target-prediction experiment: AUC scores as dataset size grows. Average AUC (y axis) plotted as a function of the minimum number of training molecules (x axis). Each method’s ability to separate known actives from a background set of 9000 random ChEMBL molecules, assumed to be inactive, is measured. Training sets are not augmented
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4696267&req=5

Fig4: Simulated target-prediction experiment: AUC scores as dataset size grows. Average AUC (y axis) plotted as a function of the minimum number of training molecules (x axis). Each method’s ability to separate known actives from a background set of 9000 random ChEMBL molecules, assumed to be inactive, is measured. Training sets are not augmented

Mentions: We address the inherent limitation to the use of cross-validation by simulating a more realistic target-prediction experiment as a proxy. To do so, we use the same trained models from the tenfold cross-validation procedure, but augment their test sets with a background dataset of 9000 molecules drawn at random from ChEMBL (the Test-RC dataset). Here we report the results only for the SVM, RF, and PS-IRV models (Table 3; Fig. 4). For brevity, we only report the AUC results, but enrichment values are available in Additional file (cf. Additional file 2: Table S8).


Accurate and efficient target prediction using a potency-sensitive influence-relevance voter.

Lusci A, Browning M, Fooshee D, Swamidass J, Baldi P - J Cheminform (2015)

Simulated target-prediction experiment: AUC scores as dataset size grows. Average AUC (y axis) plotted as a function of the minimum number of training molecules (x axis). Each method’s ability to separate known actives from a background set of 9000 random ChEMBL molecules, assumed to be inactive, is measured. Training sets are not augmented
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4696267&req=5

Fig4: Simulated target-prediction experiment: AUC scores as dataset size grows. Average AUC (y axis) plotted as a function of the minimum number of training molecules (x axis). Each method’s ability to separate known actives from a background set of 9000 random ChEMBL molecules, assumed to be inactive, is measured. Training sets are not augmented
Mentions: We address the inherent limitation to the use of cross-validation by simulating a more realistic target-prediction experiment as a proxy. To do so, we use the same trained models from the tenfold cross-validation procedure, but augment their test sets with a background dataset of 9000 molecules drawn at random from ChEMBL (the Test-RC dataset). Here we report the results only for the SVM, RF, and PS-IRV models (Table 3; Fig. 4). For brevity, we only report the AUC results, but enrichment values are available in Additional file (cf. Additional file 2: Table S8).

Bottom Line: We present two improvements over current practice.Second, we demonstrate that random inactive molecules added during training can boost the accuracy of several algorithms in realistic target-prediction experiments.Our potency-sensitive version of the IRV (PS-IRV) obtains the best results on large test sets in most of the experiments.

View Article: PubMed Central - PubMed

Affiliation: School of Information and Computer Sciences, University of California, Irvine, Irvine, USA.

ABSTRACT

Background: A number of algorithms have been proposed to predict the biological targets of diverse molecules. Some are structure-based, but the most common are ligand-based and use chemical fingerprints and the notion of chemical similarity. These methods tend to be computationally faster than others, making them particularly attractive tools as the amount of available data grows.

Results: Using a ChEMBL-derived database covering 490,760 molecule-protein interactions and 3236 protein targets, we conduct a large-scale assessment of the performance of several target-prediction algorithms at predicting drug-target activity. We assess algorithm performance using three validation procedures: standard tenfold cross-validation, tenfold cross-validation in a simulated screen that includes random inactive molecules, and validation on an external test set composed of molecules not present in our database.

Conclusions: We present two improvements over current practice. First, using a modified version of the influence-relevance voter (IRV), we show that using molecule potency data can improve target prediction. Second, we demonstrate that random inactive molecules added during training can boost the accuracy of several algorithms in realistic target-prediction experiments. Our potency-sensitive version of the IRV (PS-IRV) obtains the best results on large test sets in most of the experiments. Models and software are publicly accessible through the chemoinformatics portal at http://chemdb.ics.uci.edu/.

No MeSH data available.


Related in: MedlinePlus