Limits...
Cheminformatic models based on machine learning for pyruvate kinase inhibitors of Leishmania mexicana.

Jamal S, Scaria V - BMC Bioinformatics (2013)

Bottom Line: Random forest based approach was determined to be the most sensitive, better accuracy as well as ROC.We further added a substructure based approach to analyze the molecules to identify potentially enriched substructures in the active dataset.We believe that the models developed in the present study would lead to reduction in cost and length of clinical studies and hence newer drugs would appear faster in the market providing better healthcare options to the patients.

View Article: PubMed Central - HTML - PubMed

Affiliation: GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology, Mall Road, Delhi, 110007, India. vinods@igib.in.

ABSTRACT

Background: Leishmaniasis is a neglected tropical disease which affects approx. 12 million individuals worldwide and caused by parasite Leishmania. The current drugs used in the treatment of Leishmaniasis are highly toxic and has seen widespread emergence of drug resistant strains which necessitates the need for the development of new therapeutic options. The high throughput screen data available has made it possible to generate computational predictive models which have the ability to assess the active scaffolds in a chemical library followed by its ADME/toxicity properties in the biological trials.

Results: In the present study, we have used publicly available, high-throughput screen datasets of chemical moieties which have been adjudged to target the pyruvate kinase enzyme of L. mexicana (LmPK). The machine learning approach was used to create computational models capable of predicting the biological activity of novel antileishmanial compounds. Further, we evaluated the molecules using the substructure based approach to identify the common substructures contributing to their activity.

Conclusion: We generated computational models based on machine learning methods and evaluated the performance of these models based on various statistical figures of merit. Random forest based approach was determined to be the most sensitive, better accuracy as well as ROC. We further added a substructure based approach to analyze the molecules to identify potentially enriched substructures in the active dataset. We believe that the models developed in the present study would lead to reduction in cost and length of clinical studies and hence newer drugs would appear faster in the market providing better healthcare options to the patients.

Show MeSH

Related in: MedlinePlus

Molecular alignment of the 7 [[1]-[7]] enriched substructures (dark green) over the top 20 molecules of the active (1087) dataset obtained from PubChem (AID 1721).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4225525&req=5

Figure 4: Molecular alignment of the 7 [[1]-[7]] enriched substructures (dark green) over the top 20 molecules of the active (1087) dataset obtained from PubChem (AID 1721).

Mentions: We further evaluated whether we could understand the common or frequent molecular substructures which were associated with the molecular activity. To this end, all the active dataset compounds were clustered using LibMCS algorithm. We obtained a total of 3,418 substructures clustered up to 6 levels. A total of 501 clusters at level 6 were selected, from which 331 singletons were separated. We calculated the Chi-square and p-value for the remaining 170 substructures which correspond to the clusters were analyzed for enrichment and its significance in the active and inactive datasets (Table 2). The substructures with a frequency of >1% in the active dataset were taken that accounted for a total of 10 substructure. Stringent filtering retrieved a total of 7 substructures which had p-values less than 0.01 and enrichment factor >2. We did the molecular alignment of the selected 7 enriched substructures with the active molecules (Figure 4) and inactive molecules to calculate the enrichment of the scaffolds between the active and inactive datasets.


Cheminformatic models based on machine learning for pyruvate kinase inhibitors of Leishmania mexicana.

Jamal S, Scaria V - BMC Bioinformatics (2013)

Molecular alignment of the 7 [[1]-[7]] enriched substructures (dark green) over the top 20 molecules of the active (1087) dataset obtained from PubChem (AID 1721).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4225525&req=5

Figure 4: Molecular alignment of the 7 [[1]-[7]] enriched substructures (dark green) over the top 20 molecules of the active (1087) dataset obtained from PubChem (AID 1721).
Mentions: We further evaluated whether we could understand the common or frequent molecular substructures which were associated with the molecular activity. To this end, all the active dataset compounds were clustered using LibMCS algorithm. We obtained a total of 3,418 substructures clustered up to 6 levels. A total of 501 clusters at level 6 were selected, from which 331 singletons were separated. We calculated the Chi-square and p-value for the remaining 170 substructures which correspond to the clusters were analyzed for enrichment and its significance in the active and inactive datasets (Table 2). The substructures with a frequency of >1% in the active dataset were taken that accounted for a total of 10 substructure. Stringent filtering retrieved a total of 7 substructures which had p-values less than 0.01 and enrichment factor >2. We did the molecular alignment of the selected 7 enriched substructures with the active molecules (Figure 4) and inactive molecules to calculate the enrichment of the scaffolds between the active and inactive datasets.

Bottom Line: Random forest based approach was determined to be the most sensitive, better accuracy as well as ROC.We further added a substructure based approach to analyze the molecules to identify potentially enriched substructures in the active dataset.We believe that the models developed in the present study would lead to reduction in cost and length of clinical studies and hence newer drugs would appear faster in the market providing better healthcare options to the patients.

View Article: PubMed Central - HTML - PubMed

Affiliation: GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology, Mall Road, Delhi, 110007, India. vinods@igib.in.

ABSTRACT

Background: Leishmaniasis is a neglected tropical disease which affects approx. 12 million individuals worldwide and caused by parasite Leishmania. The current drugs used in the treatment of Leishmaniasis are highly toxic and has seen widespread emergence of drug resistant strains which necessitates the need for the development of new therapeutic options. The high throughput screen data available has made it possible to generate computational predictive models which have the ability to assess the active scaffolds in a chemical library followed by its ADME/toxicity properties in the biological trials.

Results: In the present study, we have used publicly available, high-throughput screen datasets of chemical moieties which have been adjudged to target the pyruvate kinase enzyme of L. mexicana (LmPK). The machine learning approach was used to create computational models capable of predicting the biological activity of novel antileishmanial compounds. Further, we evaluated the molecules using the substructure based approach to identify the common substructures contributing to their activity.

Conclusion: We generated computational models based on machine learning methods and evaluated the performance of these models based on various statistical figures of merit. Random forest based approach was determined to be the most sensitive, better accuracy as well as ROC. We further added a substructure based approach to analyze the molecules to identify potentially enriched substructures in the active dataset. We believe that the models developed in the present study would lead to reduction in cost and length of clinical studies and hence newer drugs would appear faster in the market providing better healthcare options to the patients.

Show MeSH
Related in: MedlinePlus