Limits...
Deciding when to stop: efficient experimentation to learn to predict drug-target interactions.

Temerinac-Ott M, Naik AW, Murphy RF - BMC Bioinformatics (2015)

Bottom Line: Especially, for drug discovery and development, active learning has been shown to reduce the number of experiments needed to obtain high-confidence predictions.We demonstrate on four previously characterized drug effect data sets that applying the stopping criteria can result in upto 40 % savings of the total experiments for highly accurate predictions.We show that active learning accuracy can be predicted using simulated data and results in substantial savings in the number of experiments required to make accurate drug-target predictions.

View Article: PubMed Central - PubMed

Affiliation: Freiburg Institute for Advanced Studies, University of Freiburg, Freiburg, Germany. temerina@frias.uni-freiburg.de.

ABSTRACT

Background: Active learning is a powerful tool for guiding an experimentation process. Instead of doing all possible experiments in a given domain, active learning can be used to pick the experiments that will add the most knowledge to the current model. Especially, for drug discovery and development, active learning has been shown to reduce the number of experiments needed to obtain high-confidence predictions. However, in practice, it is crucial to have a method to evaluate the quality of the current predictions and decide when to stop the experimentation process. Only by applying reliable stopping criteria to active learning can time and costs in the experimental process actually be saved.

Results: We compute active learning traces on simulated drug-target matrices in order to determine a regression model for the accuracy of the active learner. By analyzing the performance of the regression model on simulated data, we design stopping criteria for previously unseen experimental matrices. We demonstrate on four previously characterized drug effect data sets that applying the stopping criteria can result in upto 40 % savings of the total experiments for highly accurate predictions.

Conclusions: We show that active learning accuracy can be predicted using simulated data and results in substantial savings in the number of experiments required to make accurate drug-target predictions.

No MeSH data available.


Related in: MedlinePlus

The true accuracy (black) and the predicted accuracy (red) are shown for the four data sets: (a) Nuclear Receptor, (b) GPCR, (c) Ion Channel, (d) Enzyme
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4495685&req=5

Fig3: The true accuracy (black) and the predicted accuracy (red) are shown for the four data sets: (a) Nuclear Receptor, (b) GPCR, (c) Ion Channel, (d) Enzyme

Mentions: As discussed above, in practice we require a mechanism to decide when to stop experimentation. It is not enough to have a good active learning method without the possibility to evaluate the accuracy of the whole model apart from acquiring all the data. To address this problem, we have previously proposed a parametrization of perturbagen-target systems in which we characterize each system by its responsiveness (the probability that a perturbagen has an effect on a target) and its uniqueness (the probability that a perturbagen or target is different from others) [18]. This permits simulations of large numbers of systems to evaluate active learning strategies. We applied this approach by creating many simulated systems for interaction matrices with uniqueness and responsiveness values in the range 0.05−0.95 and with kernel noise in the range 0−0.1. We then performed active learning simulations using our KBMF model and uncertainty sampling and learned a regression function for the predicted accuracy. By uniformly varying the parameters of uniqueness and responsiveness in the range 0.05-0.95, a wide range of possible interaction matrices are generated without the limitation to a special case (a subset of possible interaction matrices). From the interaction matrices the ground truth similarity matrices can be computed by considering the similarity between the rows (target kernel) and the similarity between columns (drug kernel). The ’perfect’ similarity kernels are then disrupted by noise in order to deal with more realistic similarity matrices. It is true that the performance can be improved much further by considering only a subset in the parameter space, however in general it is not known beforehand what parameters describe the considered interaction matrix. Therefore the learned model describes a large range of possible interaction matrices. The results of applying the regression function to the computed features at each time point are shown in red in Fig. 3 for the four experimental data sets. On all four data sets, the predicted accuracy of 90 % guarantees the true accuracy to be at least 90 %, and the predicted accuracies are a reasonable lower estimate for the true accuracy. Note that a predicted accuracy of 100 %, does not imply that the true accuracy is 100 %. It is merely a prediction from the features at that time point applying the learned regression model and therefore does not indicate that the system has been overfit.Fig. 3


Deciding when to stop: efficient experimentation to learn to predict drug-target interactions.

Temerinac-Ott M, Naik AW, Murphy RF - BMC Bioinformatics (2015)

The true accuracy (black) and the predicted accuracy (red) are shown for the four data sets: (a) Nuclear Receptor, (b) GPCR, (c) Ion Channel, (d) Enzyme
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4495685&req=5

Fig3: The true accuracy (black) and the predicted accuracy (red) are shown for the four data sets: (a) Nuclear Receptor, (b) GPCR, (c) Ion Channel, (d) Enzyme
Mentions: As discussed above, in practice we require a mechanism to decide when to stop experimentation. It is not enough to have a good active learning method without the possibility to evaluate the accuracy of the whole model apart from acquiring all the data. To address this problem, we have previously proposed a parametrization of perturbagen-target systems in which we characterize each system by its responsiveness (the probability that a perturbagen has an effect on a target) and its uniqueness (the probability that a perturbagen or target is different from others) [18]. This permits simulations of large numbers of systems to evaluate active learning strategies. We applied this approach by creating many simulated systems for interaction matrices with uniqueness and responsiveness values in the range 0.05−0.95 and with kernel noise in the range 0−0.1. We then performed active learning simulations using our KBMF model and uncertainty sampling and learned a regression function for the predicted accuracy. By uniformly varying the parameters of uniqueness and responsiveness in the range 0.05-0.95, a wide range of possible interaction matrices are generated without the limitation to a special case (a subset of possible interaction matrices). From the interaction matrices the ground truth similarity matrices can be computed by considering the similarity between the rows (target kernel) and the similarity between columns (drug kernel). The ’perfect’ similarity kernels are then disrupted by noise in order to deal with more realistic similarity matrices. It is true that the performance can be improved much further by considering only a subset in the parameter space, however in general it is not known beforehand what parameters describe the considered interaction matrix. Therefore the learned model describes a large range of possible interaction matrices. The results of applying the regression function to the computed features at each time point are shown in red in Fig. 3 for the four experimental data sets. On all four data sets, the predicted accuracy of 90 % guarantees the true accuracy to be at least 90 %, and the predicted accuracies are a reasonable lower estimate for the true accuracy. Note that a predicted accuracy of 100 %, does not imply that the true accuracy is 100 %. It is merely a prediction from the features at that time point applying the learned regression model and therefore does not indicate that the system has been overfit.Fig. 3

Bottom Line: Especially, for drug discovery and development, active learning has been shown to reduce the number of experiments needed to obtain high-confidence predictions.We demonstrate on four previously characterized drug effect data sets that applying the stopping criteria can result in upto 40 % savings of the total experiments for highly accurate predictions.We show that active learning accuracy can be predicted using simulated data and results in substantial savings in the number of experiments required to make accurate drug-target predictions.

View Article: PubMed Central - PubMed

Affiliation: Freiburg Institute for Advanced Studies, University of Freiburg, Freiburg, Germany. temerina@frias.uni-freiburg.de.

ABSTRACT

Background: Active learning is a powerful tool for guiding an experimentation process. Instead of doing all possible experiments in a given domain, active learning can be used to pick the experiments that will add the most knowledge to the current model. Especially, for drug discovery and development, active learning has been shown to reduce the number of experiments needed to obtain high-confidence predictions. However, in practice, it is crucial to have a method to evaluate the quality of the current predictions and decide when to stop the experimentation process. Only by applying reliable stopping criteria to active learning can time and costs in the experimental process actually be saved.

Results: We compute active learning traces on simulated drug-target matrices in order to determine a regression model for the accuracy of the active learner. By analyzing the performance of the regression model on simulated data, we design stopping criteria for previously unseen experimental matrices. We demonstrate on four previously characterized drug effect data sets that applying the stopping criteria can result in upto 40 % savings of the total experiments for highly accurate predictions.

Conclusions: We show that active learning accuracy can be predicted using simulated data and results in substantial savings in the number of experiments required to make accurate drug-target predictions.

No MeSH data available.


Related in: MedlinePlus