Limits...
Likelihood-based gene annotations for gap filling and quality assessment in genome-scale metabolic models.

Benedict MN, Mundy MB, Henry CS, Chia N, Price ND - PLoS Comput. Biol. (2014)

Bottom Line: We show that computed likelihood values were significantly higher for annotations found in manually curated metabolic networks than those that were not.We also demonstrate that models gap filled using likelihood-based gap filling provide greater coverage and genomic consistency with metabolic gene functions compared to parsimony-based approaches.All described workflows are implemented as part of the DOE Systems Biology Knowledgebase (KBase) and are publicly available via API or command-line web interface.

View Article: PubMed Central - PubMed

Affiliation: Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America.

ABSTRACT
Genome-scale metabolic models provide a powerful means to harness information from genomes to deepen biological insights. With exponentially increasing sequencing capacity, there is an enormous need for automated reconstruction techniques that can provide more accurate models in a short time frame. Current methods for automated metabolic network reconstruction rely on gene and reaction annotations to build draft metabolic networks and algorithms to fill gaps in these networks. However, automated reconstruction is hampered by database inconsistencies, incorrect annotations, and gap filling largely without considering genomic information. Here we develop an approach for applying genomic information to predict alternative functions for genes and estimate their likelihoods from sequence homology. We show that computed likelihood values were significantly higher for annotations found in manually curated metabolic networks than those that were not. We then apply these alternative functional predictions to estimate reaction likelihoods, which are used in a new gap filling approach called likelihood-based gap filling to predict more genomically consistent solutions. To validate the likelihood-based gap filling approach, we applied it to models where essential pathways were removed, finding that likelihood-based gap filling identified more biologically relevant solutions than parsimony-based gap filling approaches. We also demonstrate that models gap filled using likelihood-based gap filling provide greater coverage and genomic consistency with metabolic gene functions compared to parsimony-based approaches. Interestingly, despite these findings, we found that likelihoods did not significantly affect consistency of gap filled models with Biolog and knockout lethality data. This indicates that the phenotype data alone cannot necessarily be used to discriminate between alternative solutions for gap filling and therefore, that the use of other information is necessary to obtain a more accurate network. All described workflows are implemented as part of the DOE Systems Biology Knowledgebase (KBase) and are publicly available via API or command-line web interface.

Show MeSH

Related in: MedlinePlus

Proof of principle: Gap filling highly-likely reactions in B. subtilis.B. subtilis synthesizes lipids via the non-mevalonate pathway (blue) [37]. We removed this pathway from the B. subtilis genome-scale model and then tried to fill the gap using both the likelihood and parsimony-based approaches. The parsimony-based gap filling approach instead filled the gap with the mevalonate pathway (red), which is shorter but not supported by genetic evidence. The likelihood-based approach filled the gap with the correct pathway. Black indicates reactions that were not knocked out (there was no explicit link to literature evidence in the B. subtilis model). The numeric labels are the computed likelihoods of gap filling reactions.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4199484&req=5

pcbi-1003882-g003: Proof of principle: Gap filling highly-likely reactions in B. subtilis.B. subtilis synthesizes lipids via the non-mevalonate pathway (blue) [37]. We removed this pathway from the B. subtilis genome-scale model and then tried to fill the gap using both the likelihood and parsimony-based approaches. The parsimony-based gap filling approach instead filled the gap with the mevalonate pathway (red), which is shorter but not supported by genetic evidence. The likelihood-based approach filled the gap with the correct pathway. Black indicates reactions that were not knocked out (there was no explicit link to literature evidence in the B. subtilis model). The numeric labels are the computed likelihoods of gap filling reactions.

Mentions: Although both methods had the same number of tuning parameters available, likelihood-based gap filling successfully outperformed the parsimony-based method by replacing a maximum of 31 of the 32 gold-standard reactions. Parsimony-based gap filling only replaced only a maximum of 24 reactions, regardless of the chosen penalties (Dataset S2). The failures in parsimony-based gap filling were a result of picking shorter pathways to fill certain gaps for which longer pathways are the correct choice. For example, the synthesis of isopentyl diphosphate (IPDP), a primary precursor for lipid synthesis, can occur by one of two routes, the mevalonate pathway and the non-mevalonate pathway [35]. B. subtilis uses the non-mevalonate pathway for IPDP synthesis [36], [37]. The mevalonate pathway contains fewer reactions than the non-mevalonate pathway, and thus the parsimony-based gap filling approach incorrectly used the mevalonate pathway to restore IPDP production (Figure 3). However, all of the knocked out reactions in the non-mevalonate pathway had high estimated likelihoods. Hence, likelihood-based gap filling correctly chose this pathway to restore production of IPDP.


Likelihood-based gene annotations for gap filling and quality assessment in genome-scale metabolic models.

Benedict MN, Mundy MB, Henry CS, Chia N, Price ND - PLoS Comput. Biol. (2014)

Proof of principle: Gap filling highly-likely reactions in B. subtilis.B. subtilis synthesizes lipids via the non-mevalonate pathway (blue) [37]. We removed this pathway from the B. subtilis genome-scale model and then tried to fill the gap using both the likelihood and parsimony-based approaches. The parsimony-based gap filling approach instead filled the gap with the mevalonate pathway (red), which is shorter but not supported by genetic evidence. The likelihood-based approach filled the gap with the correct pathway. Black indicates reactions that were not knocked out (there was no explicit link to literature evidence in the B. subtilis model). The numeric labels are the computed likelihoods of gap filling reactions.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4199484&req=5

pcbi-1003882-g003: Proof of principle: Gap filling highly-likely reactions in B. subtilis.B. subtilis synthesizes lipids via the non-mevalonate pathway (blue) [37]. We removed this pathway from the B. subtilis genome-scale model and then tried to fill the gap using both the likelihood and parsimony-based approaches. The parsimony-based gap filling approach instead filled the gap with the mevalonate pathway (red), which is shorter but not supported by genetic evidence. The likelihood-based approach filled the gap with the correct pathway. Black indicates reactions that were not knocked out (there was no explicit link to literature evidence in the B. subtilis model). The numeric labels are the computed likelihoods of gap filling reactions.
Mentions: Although both methods had the same number of tuning parameters available, likelihood-based gap filling successfully outperformed the parsimony-based method by replacing a maximum of 31 of the 32 gold-standard reactions. Parsimony-based gap filling only replaced only a maximum of 24 reactions, regardless of the chosen penalties (Dataset S2). The failures in parsimony-based gap filling were a result of picking shorter pathways to fill certain gaps for which longer pathways are the correct choice. For example, the synthesis of isopentyl diphosphate (IPDP), a primary precursor for lipid synthesis, can occur by one of two routes, the mevalonate pathway and the non-mevalonate pathway [35]. B. subtilis uses the non-mevalonate pathway for IPDP synthesis [36], [37]. The mevalonate pathway contains fewer reactions than the non-mevalonate pathway, and thus the parsimony-based gap filling approach incorrectly used the mevalonate pathway to restore IPDP production (Figure 3). However, all of the knocked out reactions in the non-mevalonate pathway had high estimated likelihoods. Hence, likelihood-based gap filling correctly chose this pathway to restore production of IPDP.

Bottom Line: We show that computed likelihood values were significantly higher for annotations found in manually curated metabolic networks than those that were not.We also demonstrate that models gap filled using likelihood-based gap filling provide greater coverage and genomic consistency with metabolic gene functions compared to parsimony-based approaches.All described workflows are implemented as part of the DOE Systems Biology Knowledgebase (KBase) and are publicly available via API or command-line web interface.

View Article: PubMed Central - PubMed

Affiliation: Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America.

ABSTRACT
Genome-scale metabolic models provide a powerful means to harness information from genomes to deepen biological insights. With exponentially increasing sequencing capacity, there is an enormous need for automated reconstruction techniques that can provide more accurate models in a short time frame. Current methods for automated metabolic network reconstruction rely on gene and reaction annotations to build draft metabolic networks and algorithms to fill gaps in these networks. However, automated reconstruction is hampered by database inconsistencies, incorrect annotations, and gap filling largely without considering genomic information. Here we develop an approach for applying genomic information to predict alternative functions for genes and estimate their likelihoods from sequence homology. We show that computed likelihood values were significantly higher for annotations found in manually curated metabolic networks than those that were not. We then apply these alternative functional predictions to estimate reaction likelihoods, which are used in a new gap filling approach called likelihood-based gap filling to predict more genomically consistent solutions. To validate the likelihood-based gap filling approach, we applied it to models where essential pathways were removed, finding that likelihood-based gap filling identified more biologically relevant solutions than parsimony-based gap filling approaches. We also demonstrate that models gap filled using likelihood-based gap filling provide greater coverage and genomic consistency with metabolic gene functions compared to parsimony-based approaches. Interestingly, despite these findings, we found that likelihoods did not significantly affect consistency of gap filled models with Biolog and knockout lethality data. This indicates that the phenotype data alone cannot necessarily be used to discriminate between alternative solutions for gap filling and therefore, that the use of other information is necessary to obtain a more accurate network. All described workflows are implemented as part of the DOE Systems Biology Knowledgebase (KBase) and are publicly available via API or command-line web interface.

Show MeSH
Related in: MedlinePlus