Limits...
Machine Learning Models and Pathway Genome Data Base for Trypanosoma cruzi Drug Discovery.

Ekins S, de Siqueira-Neto JL, McCall LI, Sarker M, Yadav M, Ponder EL, Kallel EA, Kellar D, Chen S, Arkin M, Bunin BA, McKerrow JH, Talcott C - PLoS Negl Trop Dis (2015)

Bottom Line: Ninety-seven compounds were selected for in vitro testing, and 11 of these were found to have EC50 < 10 μM.We have demonstrated how combining chemoinformatics and bioinformatics for T. cruzi drug discovery can bring interesting in vivo active molecules to light that may have been overlooked.The approach we have taken is broadly applicable to other NTDs.

View Article: PubMed Central - PubMed

Affiliation: Collaborative Drug Discovery, Burlingame, California, United States of America; Collaborations in Chemistry, Fuquay-Varina, North Carolina, United States of America.

ABSTRACT

Background: Chagas disease is a neglected tropical disease (NTD) caused by the eukaryotic parasite Trypanosoma cruzi. The current clinical and preclinical pipeline for T. cruzi is extremely sparse and lacks drug target diversity.

Methodology/principal findings: In the present study we developed a computational approach that utilized data from several public whole-cell, phenotypic high throughput screens that have been completed for T. cruzi by the Broad Institute, including a single screen of over 300,000 molecules in the search for chemical probes as part of the NIH Molecular Libraries program. We have also compiled and curated relevant biological and chemical compound screening data including (i) compounds and biological activity data from the literature, (ii) high throughput screening datasets, and (iii) predicted metabolites of T. cruzi metabolic pathways. This information was used to help us identify compounds and their potential targets. We have constructed a Pathway Genome Data Base for T. cruzi. In addition, we have developed Bayesian machine learning models that were used to virtually screen libraries of compounds. Ninety-seven compounds were selected for in vitro testing, and 11 of these were found to have EC50 < 10 μM. We progressed five compounds to an in vivo mouse efficacy model of Chagas disease and validated that the machine learning model could identify in vitro active compounds not in the training set, as well as known positive controls. The antimalarial pyronaridine possessed 85.2% efficacy in the acute Chagas mouse model. We have also proposed potential targets (for future verification) for this compound based on structural similarity to known compounds with targets in T. cruzi.

Conclusions/ significance: We have demonstrated how combining chemoinformatics and bioinformatics for T. cruzi drug discovery can bring interesting in vivo active molecules to light that may have been overlooked. The approach we have taken is broadly applicable to other NTDs.

No MeSH data available.


Related in: MedlinePlus

A typical metabolic cellular overview of TCruCyc provided by the Pathway Tools web server.This view of the TCruCyc PGDB shows the (almost entirely) inferred set of metabolic pathways from gene sequence data. Canonical pathways such as “Amino Acids Biosynthesis”, “Amino Acids Degradation”, “Nucleosides and Nucleotides Biosynthesis”, “Fatty Acids and Lipids Biosynthesis” and “Respiration” are partially inferred as well as a large set of single reaction steps (right side) that Pathway Tools could integrate into larger pathways. This is an expected level of derivable connectivity that would be available from annotated genome and proteome sequence data. We expect that a significant number of unassigned protein functions can be assigned by extending Pathway Tools with (high threshold) automated sequence similarity analysis that is currently done via manual curation.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4482694&req=5

pntd.0003878.g001: A typical metabolic cellular overview of TCruCyc provided by the Pathway Tools web server.This view of the TCruCyc PGDB shows the (almost entirely) inferred set of metabolic pathways from gene sequence data. Canonical pathways such as “Amino Acids Biosynthesis”, “Amino Acids Degradation”, “Nucleosides and Nucleotides Biosynthesis”, “Fatty Acids and Lipids Biosynthesis” and “Respiration” are partially inferred as well as a large set of single reaction steps (right side) that Pathway Tools could integrate into larger pathways. This is an expected level of derivable connectivity that would be available from annotated genome and proteome sequence data. We expect that a significant number of unassigned protein functions can be assigned by extending Pathway Tools with (high threshold) automated sequence similarity analysis that is currently done via manual curation.

Mentions: A PGDB was constructed for T. cruzi using the complete genome sequence of the Dm28c strain (Fig 1). The underlying genome sequence consisted of 5,287 contigs assembled into 1,378 scaffolds of 30,716,540 base pairs. Pathologic found 11,349 distinct gene products, at least 880 of which were found to be enzymes and at least 16 of which are transporters. Pathologic was able to infer 1030 enzymatic reactions and 122 pathways from these assignments as well as the existence of 806 metabolic compounds. This set was filtered to 358 molecules after removal of compounds with R- groups and small nuisance molecules. This dataset was then used to infer potential targets by comparing the Tanimoto similarity with a phenotypic screening hit [42].


Machine Learning Models and Pathway Genome Data Base for Trypanosoma cruzi Drug Discovery.

Ekins S, de Siqueira-Neto JL, McCall LI, Sarker M, Yadav M, Ponder EL, Kallel EA, Kellar D, Chen S, Arkin M, Bunin BA, McKerrow JH, Talcott C - PLoS Negl Trop Dis (2015)

A typical metabolic cellular overview of TCruCyc provided by the Pathway Tools web server.This view of the TCruCyc PGDB shows the (almost entirely) inferred set of metabolic pathways from gene sequence data. Canonical pathways such as “Amino Acids Biosynthesis”, “Amino Acids Degradation”, “Nucleosides and Nucleotides Biosynthesis”, “Fatty Acids and Lipids Biosynthesis” and “Respiration” are partially inferred as well as a large set of single reaction steps (right side) that Pathway Tools could integrate into larger pathways. This is an expected level of derivable connectivity that would be available from annotated genome and proteome sequence data. We expect that a significant number of unassigned protein functions can be assigned by extending Pathway Tools with (high threshold) automated sequence similarity analysis that is currently done via manual curation.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4482694&req=5

pntd.0003878.g001: A typical metabolic cellular overview of TCruCyc provided by the Pathway Tools web server.This view of the TCruCyc PGDB shows the (almost entirely) inferred set of metabolic pathways from gene sequence data. Canonical pathways such as “Amino Acids Biosynthesis”, “Amino Acids Degradation”, “Nucleosides and Nucleotides Biosynthesis”, “Fatty Acids and Lipids Biosynthesis” and “Respiration” are partially inferred as well as a large set of single reaction steps (right side) that Pathway Tools could integrate into larger pathways. This is an expected level of derivable connectivity that would be available from annotated genome and proteome sequence data. We expect that a significant number of unassigned protein functions can be assigned by extending Pathway Tools with (high threshold) automated sequence similarity analysis that is currently done via manual curation.
Mentions: A PGDB was constructed for T. cruzi using the complete genome sequence of the Dm28c strain (Fig 1). The underlying genome sequence consisted of 5,287 contigs assembled into 1,378 scaffolds of 30,716,540 base pairs. Pathologic found 11,349 distinct gene products, at least 880 of which were found to be enzymes and at least 16 of which are transporters. Pathologic was able to infer 1030 enzymatic reactions and 122 pathways from these assignments as well as the existence of 806 metabolic compounds. This set was filtered to 358 molecules after removal of compounds with R- groups and small nuisance molecules. This dataset was then used to infer potential targets by comparing the Tanimoto similarity with a phenotypic screening hit [42].

Bottom Line: Ninety-seven compounds were selected for in vitro testing, and 11 of these were found to have EC50 < 10 μM.We have demonstrated how combining chemoinformatics and bioinformatics for T. cruzi drug discovery can bring interesting in vivo active molecules to light that may have been overlooked.The approach we have taken is broadly applicable to other NTDs.

View Article: PubMed Central - PubMed

Affiliation: Collaborative Drug Discovery, Burlingame, California, United States of America; Collaborations in Chemistry, Fuquay-Varina, North Carolina, United States of America.

ABSTRACT

Background: Chagas disease is a neglected tropical disease (NTD) caused by the eukaryotic parasite Trypanosoma cruzi. The current clinical and preclinical pipeline for T. cruzi is extremely sparse and lacks drug target diversity.

Methodology/principal findings: In the present study we developed a computational approach that utilized data from several public whole-cell, phenotypic high throughput screens that have been completed for T. cruzi by the Broad Institute, including a single screen of over 300,000 molecules in the search for chemical probes as part of the NIH Molecular Libraries program. We have also compiled and curated relevant biological and chemical compound screening data including (i) compounds and biological activity data from the literature, (ii) high throughput screening datasets, and (iii) predicted metabolites of T. cruzi metabolic pathways. This information was used to help us identify compounds and their potential targets. We have constructed a Pathway Genome Data Base for T. cruzi. In addition, we have developed Bayesian machine learning models that were used to virtually screen libraries of compounds. Ninety-seven compounds were selected for in vitro testing, and 11 of these were found to have EC50 < 10 μM. We progressed five compounds to an in vivo mouse efficacy model of Chagas disease and validated that the machine learning model could identify in vitro active compounds not in the training set, as well as known positive controls. The antimalarial pyronaridine possessed 85.2% efficacy in the acute Chagas mouse model. We have also proposed potential targets (for future verification) for this compound based on structural similarity to known compounds with targets in T. cruzi.

Conclusions/ significance: We have demonstrated how combining chemoinformatics and bioinformatics for T. cruzi drug discovery can bring interesting in vivo active molecules to light that may have been overlooked. The approach we have taken is broadly applicable to other NTDs.

No MeSH data available.


Related in: MedlinePlus