Limits...
Deciding when to stop: efficient experimentation to learn to predict drug-target interactions.

Temerinac-Ott M, Naik AW, Murphy RF - BMC Bioinformatics (2015)

Bottom Line: Especially, for drug discovery and development, active learning has been shown to reduce the number of experiments needed to obtain high-confidence predictions.We demonstrate on four previously characterized drug effect data sets that applying the stopping criteria can result in upto 40 % savings of the total experiments for highly accurate predictions.We show that active learning accuracy can be predicted using simulated data and results in substantial savings in the number of experiments required to make accurate drug-target predictions.

View Article: PubMed Central - PubMed

Affiliation: Freiburg Institute for Advanced Studies, University of Freiburg, Freiburg, Germany. temerina@frias.uni-freiburg.de.

ABSTRACT

Background: Active learning is a powerful tool for guiding an experimentation process. Instead of doing all possible experiments in a given domain, active learning can be used to pick the experiments that will add the most knowledge to the current model. Especially, for drug discovery and development, active learning has been shown to reduce the number of experiments needed to obtain high-confidence predictions. However, in practice, it is crucial to have a method to evaluate the quality of the current predictions and decide when to stop the experimentation process. Only by applying reliable stopping criteria to active learning can time and costs in the experimental process actually be saved.

Results: We compute active learning traces on simulated drug-target matrices in order to determine a regression model for the accuracy of the active learner. By analyzing the performance of the regression model on simulated data, we design stopping criteria for previously unseen experimental matrices. We demonstrate on four previously characterized drug effect data sets that applying the stopping criteria can result in upto 40 % savings of the total experiments for highly accurate predictions.

Conclusions: We show that active learning accuracy can be predicted using simulated data and results in substantial savings in the number of experiments required to make accurate drug-target predictions.

No MeSH data available.


Related in: MedlinePlus

The probability that a predicted accuracy is below or equal to the true accuracy is plotted against the threshold
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4495685&req=5

Fig4: The probability that a predicted accuracy is below or equal to the true accuracy is plotted against the threshold

Mentions: Statistics on the performance of the accuracy predictor in simulations can be used to design a stopping rule [18]. We adopt this method to determine a threshold for stopping the active learning procedure. The simulated data is used to assess the probability that the true accuracy is greater than or equal to the predicted accuracy using 11-fold cross-validation. The number of folds for cross-validation is essentially an arbitrary choice. By choosing 11-fold cross validation over 10-fold cross validation we have a bit more training data available in each round. We count for each predicted accuracy value how often the condition was fulfilled and divide it by the total occurrence of this predicted value (Fig. 4). As expected, a low predicted accuracy has a high probability that the accuracy measured in the actual experiments will be higher. In the beginning of the active learning procedure a small amount of data is available, so it is hard to make good predictions about the accuracy of the method. However, the more data is gathered in the active learning procedure, the more confident the predictor gets, reaching a peak for predicting the accuracy of 0.8 and higher for 65 % of the cases. For very high accuracies (>0.95), the chance that the actual accuracy exceeds the prediction naturally drops drastically. From Fig. 4 the best threshold to stop lies in the range 0.8 to 0.9. Since higher accuracy values are more desirable, our stopping rule was to terminate the active learning procedure when the predicted accuracy was 0.9.Fig. 4


Deciding when to stop: efficient experimentation to learn to predict drug-target interactions.

Temerinac-Ott M, Naik AW, Murphy RF - BMC Bioinformatics (2015)

The probability that a predicted accuracy is below or equal to the true accuracy is plotted against the threshold
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4495685&req=5

Fig4: The probability that a predicted accuracy is below or equal to the true accuracy is plotted against the threshold
Mentions: Statistics on the performance of the accuracy predictor in simulations can be used to design a stopping rule [18]. We adopt this method to determine a threshold for stopping the active learning procedure. The simulated data is used to assess the probability that the true accuracy is greater than or equal to the predicted accuracy using 11-fold cross-validation. The number of folds for cross-validation is essentially an arbitrary choice. By choosing 11-fold cross validation over 10-fold cross validation we have a bit more training data available in each round. We count for each predicted accuracy value how often the condition was fulfilled and divide it by the total occurrence of this predicted value (Fig. 4). As expected, a low predicted accuracy has a high probability that the accuracy measured in the actual experiments will be higher. In the beginning of the active learning procedure a small amount of data is available, so it is hard to make good predictions about the accuracy of the method. However, the more data is gathered in the active learning procedure, the more confident the predictor gets, reaching a peak for predicting the accuracy of 0.8 and higher for 65 % of the cases. For very high accuracies (>0.95), the chance that the actual accuracy exceeds the prediction naturally drops drastically. From Fig. 4 the best threshold to stop lies in the range 0.8 to 0.9. Since higher accuracy values are more desirable, our stopping rule was to terminate the active learning procedure when the predicted accuracy was 0.9.Fig. 4

Bottom Line: Especially, for drug discovery and development, active learning has been shown to reduce the number of experiments needed to obtain high-confidence predictions.We demonstrate on four previously characterized drug effect data sets that applying the stopping criteria can result in upto 40 % savings of the total experiments for highly accurate predictions.We show that active learning accuracy can be predicted using simulated data and results in substantial savings in the number of experiments required to make accurate drug-target predictions.

View Article: PubMed Central - PubMed

Affiliation: Freiburg Institute for Advanced Studies, University of Freiburg, Freiburg, Germany. temerina@frias.uni-freiburg.de.

ABSTRACT

Background: Active learning is a powerful tool for guiding an experimentation process. Instead of doing all possible experiments in a given domain, active learning can be used to pick the experiments that will add the most knowledge to the current model. Especially, for drug discovery and development, active learning has been shown to reduce the number of experiments needed to obtain high-confidence predictions. However, in practice, it is crucial to have a method to evaluate the quality of the current predictions and decide when to stop the experimentation process. Only by applying reliable stopping criteria to active learning can time and costs in the experimental process actually be saved.

Results: We compute active learning traces on simulated drug-target matrices in order to determine a regression model for the accuracy of the active learner. By analyzing the performance of the regression model on simulated data, we design stopping criteria for previously unseen experimental matrices. We demonstrate on four previously characterized drug effect data sets that applying the stopping criteria can result in upto 40 % savings of the total experiments for highly accurate predictions.

Conclusions: We show that active learning accuracy can be predicted using simulated data and results in substantial savings in the number of experiments required to make accurate drug-target predictions.

No MeSH data available.


Related in: MedlinePlus