Limits...
Cheminformatic models based on machine learning for pyruvate kinase inhibitors of Leishmania mexicana.

Jamal S, Scaria V - BMC Bioinformatics (2013)

Bottom Line: Random forest based approach was determined to be the most sensitive, better accuracy as well as ROC.We further added a substructure based approach to analyze the molecules to identify potentially enriched substructures in the active dataset.We believe that the models developed in the present study would lead to reduction in cost and length of clinical studies and hence newer drugs would appear faster in the market providing better healthcare options to the patients.

View Article: PubMed Central - HTML - PubMed

Affiliation: GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology, Mall Road, Delhi, 110007, India. vinods@igib.in.

ABSTRACT

Background: Leishmaniasis is a neglected tropical disease which affects approx. 12 million individuals worldwide and caused by parasite Leishmania. The current drugs used in the treatment of Leishmaniasis are highly toxic and has seen widespread emergence of drug resistant strains which necessitates the need for the development of new therapeutic options. The high throughput screen data available has made it possible to generate computational predictive models which have the ability to assess the active scaffolds in a chemical library followed by its ADME/toxicity properties in the biological trials.

Results: In the present study, we have used publicly available, high-throughput screen datasets of chemical moieties which have been adjudged to target the pyruvate kinase enzyme of L. mexicana (LmPK). The machine learning approach was used to create computational models capable of predicting the biological activity of novel antileishmanial compounds. Further, we evaluated the molecules using the substructure based approach to identify the common substructures contributing to their activity.

Conclusion: We generated computational models based on machine learning methods and evaluated the performance of these models based on various statistical figures of merit. Random forest based approach was determined to be the most sensitive, better accuracy as well as ROC. We further added a substructure based approach to analyze the molecules to identify potentially enriched substructures in the active dataset. We believe that the models developed in the present study would lead to reduction in cost and length of clinical studies and hence newer drugs would appear faster in the market providing better healthcare options to the patients.

Show MeSH

Related in: MedlinePlus

Plot of Sensitivity and Specificity of models generated based on molecular descriptors.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4225525&req=5

Figure 2: Plot of Sensitivity and Specificity of models generated based on molecular descriptors.

Mentions: A number of models were generated using different classifiers described in the materials and methods section. The best models for each classifier were selected on the basis of accuracy of the models generated. In the present study, all the models generated had around 80% accuracy (Figure 1). Various other statistical figures such as sensitivity, specificity and BCR were also used to check the robustness of the models. Since accuracy alone cannot be used to assess the performance of the models owing to the high imbalance in the data, we have used Balanced Classification Rate (BCR) which introduces a correct balance in the sensitivity and specificity and gives a more accurate measure of the performance of the models. All the models had around 80% sensitivity and specificity with the RF model being the most sensitive and NB the least (Figure 2). Also the RF model turned out to be the most accurate classifier having BCR, the average of sensitivity and specificity, value as 83%. We also performed an analysis of the Receiver Operator Characteristics (ROC) which was further used to compare and evaluate the performance of each of the models for their efficiency and robustness. All the models had a significant Area under Curve (AUC) on the ROC plot, which can be seen in Figure 3. It can be easily interpreted from the results that among all the classifiers, i.e. NB, RF, SMO and J48, Random Forest performed better than the rest and was established as the best classifier providing an overall good classification.


Cheminformatic models based on machine learning for pyruvate kinase inhibitors of Leishmania mexicana.

Jamal S, Scaria V - BMC Bioinformatics (2013)

Plot of Sensitivity and Specificity of models generated based on molecular descriptors.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4225525&req=5

Figure 2: Plot of Sensitivity and Specificity of models generated based on molecular descriptors.
Mentions: A number of models were generated using different classifiers described in the materials and methods section. The best models for each classifier were selected on the basis of accuracy of the models generated. In the present study, all the models generated had around 80% accuracy (Figure 1). Various other statistical figures such as sensitivity, specificity and BCR were also used to check the robustness of the models. Since accuracy alone cannot be used to assess the performance of the models owing to the high imbalance in the data, we have used Balanced Classification Rate (BCR) which introduces a correct balance in the sensitivity and specificity and gives a more accurate measure of the performance of the models. All the models had around 80% sensitivity and specificity with the RF model being the most sensitive and NB the least (Figure 2). Also the RF model turned out to be the most accurate classifier having BCR, the average of sensitivity and specificity, value as 83%. We also performed an analysis of the Receiver Operator Characteristics (ROC) which was further used to compare and evaluate the performance of each of the models for their efficiency and robustness. All the models had a significant Area under Curve (AUC) on the ROC plot, which can be seen in Figure 3. It can be easily interpreted from the results that among all the classifiers, i.e. NB, RF, SMO and J48, Random Forest performed better than the rest and was established as the best classifier providing an overall good classification.

Bottom Line: Random forest based approach was determined to be the most sensitive, better accuracy as well as ROC.We further added a substructure based approach to analyze the molecules to identify potentially enriched substructures in the active dataset.We believe that the models developed in the present study would lead to reduction in cost and length of clinical studies and hence newer drugs would appear faster in the market providing better healthcare options to the patients.

View Article: PubMed Central - HTML - PubMed

Affiliation: GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology, Mall Road, Delhi, 110007, India. vinods@igib.in.

ABSTRACT

Background: Leishmaniasis is a neglected tropical disease which affects approx. 12 million individuals worldwide and caused by parasite Leishmania. The current drugs used in the treatment of Leishmaniasis are highly toxic and has seen widespread emergence of drug resistant strains which necessitates the need for the development of new therapeutic options. The high throughput screen data available has made it possible to generate computational predictive models which have the ability to assess the active scaffolds in a chemical library followed by its ADME/toxicity properties in the biological trials.

Results: In the present study, we have used publicly available, high-throughput screen datasets of chemical moieties which have been adjudged to target the pyruvate kinase enzyme of L. mexicana (LmPK). The machine learning approach was used to create computational models capable of predicting the biological activity of novel antileishmanial compounds. Further, we evaluated the molecules using the substructure based approach to identify the common substructures contributing to their activity.

Conclusion: We generated computational models based on machine learning methods and evaluated the performance of these models based on various statistical figures of merit. Random forest based approach was determined to be the most sensitive, better accuracy as well as ROC. We further added a substructure based approach to analyze the molecules to identify potentially enriched substructures in the active dataset. We believe that the models developed in the present study would lead to reduction in cost and length of clinical studies and hence newer drugs would appear faster in the market providing better healthcare options to the patients.

Show MeSH
Related in: MedlinePlus