Limits...
Directing experimental biology: a case study in mitochondrial biogenesis.

Hibbs MA, Myers CL, Huttenhower C, Hess DC, Li K, Caudy AA, Troyanskaya OG - PLoS Comput. Biol. (2009)

Bottom Line: Here we analyze and explore the results of this study that are broadly applicable for computationalists applying gene function prediction techniques, including a new experimental comparison with 48 genes representing the genomic background.Our study leads to several conclusions that are important to consider when driving laboratory investigations using computational prediction approaches.While this study focused on a specific functional area in yeast, many of these observations may be useful in the contexts of other processes and organisms.

View Article: PubMed Central - PubMed

Affiliation: Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Laboratory, Princeton, New Jersey, United States of America.

ABSTRACT
Computational approaches have promised to organize collections of functional genomics data into testable predictions of gene and protein involvement in biological processes and pathways. However, few such predictions have been experimentally validated on a large scale, leaving many bioinformatic methods unproven and underutilized in the biology community. Further, it remains unclear what biological concerns should be taken into account when using computational methods to drive real-world experimental efforts. To investigate these concerns and to establish the utility of computational predictions of gene function, we experimentally tested hundreds of predictions generated from an ensemble of three complementary methods for the process of mitochondrial organization and biogenesis in Saccharomyces cerevisiae. The biological data with respect to the mitochondria are presented in a companion manuscript published in PLoS Genetics (doi:10.1371/journal.pgen.1000407). Here we analyze and explore the results of this study that are broadly applicable for computationalists applying gene function prediction techniques, including a new experimental comparison with 48 genes representing the genomic background. Our study leads to several conclusions that are important to consider when driving laboratory investigations using computational prediction approaches. While most genes in yeast are already known to participate in at least one biological process, we confirm that genes with known functions can still be strong candidates for annotation of additional gene functions. We find that different analysis techniques and different underlying data can both greatly affect the types of functional predictions produced by computational methods. This diversity allows an ensemble of techniques to substantially broaden the biological scope and breadth of predictions. We also find that performing prediction and validation steps iteratively allows us to more completely characterize a biological area of interest. While this study focused on a specific functional area in yeast, many of these observations may be useful in the contexts of other processes and organisms.

Show MeSH
Individual method accuracy and overlap.Three computational methods and an ensemble of those methods were used to select candidates for experimental evaluation. Of the 183 predictions evaluated in our first iteration, 88 were chosen from the top 40 results of at least one individual method, while the remaining 95 were selected from the ensemble of all three. (A) The accuracy of the predictions chosen from each method, from genes selected by the ensemble, and the overall accuracy for all candidates tested in our first iteration. (B) Overlap between candidates selected from the individual methods. Each individual method performs with similar accuracy but predicts unique genes.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2654405&req=5

pcbi-1000322-g004: Individual method accuracy and overlap.Three computational methods and an ensemble of those methods were used to select candidates for experimental evaluation. Of the 183 predictions evaluated in our first iteration, 88 were chosen from the top 40 results of at least one individual method, while the remaining 95 were selected from the ensemble of all three. (A) The accuracy of the predictions chosen from each method, from genes selected by the ensemble, and the overall accuracy for all candidates tested in our first iteration. (B) Overlap between candidates selected from the individual methods. Each individual method performs with similar accuracy but predicts unique genes.

Mentions: In addition to demonstrating the accuracy of computational function prediction approaches, our results also emphasize the importance of considering the specific biological nature of predictions. Specifically, our results show that different computational approaches can produce equally accurate - but distinct - predictions depending on the algorithmic foundation and underlying data of each method. Although we did not attempt a comprehensive study of all types of computational function prediction methods, the three methods used in this study included both supervised and unsupervised approaches utilizing different data sources, and our observations are likely to be generally applicable. To demonstrate this generality, we have also analyzed additional canonical computational function prediction approaches (a Support Vector Machine (SVM) trained using only microarray data, an SVM trained using diverse data, and unsupervised correlation across microarray data). This additional analysis supports the results and conclusions presented below and is fully discussed in Text S1. Each of the three function prediction methods employed in this study achieved similarly high rates of phenotypic positives (Figure 4A). However, there was a relatively small overlap between the 40 most confident predictions of each method, as only 8% of the 88 total candidates selected from an individual method were common to all three (Figure 4B). True positive rates were similar among genes predicted confidently by only one method or by multiple methods, indicating that each computational approach was accurately predicting disparate aspects of mitochondrial organization and biogenesis. This variation can be accounted for both by differences in the underlying data and by algorithmic diversity among the computational approaches. As discussed below, such differences among methods should be carefully considered when developing new prediction techniques or applying them in a biological setting.


Directing experimental biology: a case study in mitochondrial biogenesis.

Hibbs MA, Myers CL, Huttenhower C, Hess DC, Li K, Caudy AA, Troyanskaya OG - PLoS Comput. Biol. (2009)

Individual method accuracy and overlap.Three computational methods and an ensemble of those methods were used to select candidates for experimental evaluation. Of the 183 predictions evaluated in our first iteration, 88 were chosen from the top 40 results of at least one individual method, while the remaining 95 were selected from the ensemble of all three. (A) The accuracy of the predictions chosen from each method, from genes selected by the ensemble, and the overall accuracy for all candidates tested in our first iteration. (B) Overlap between candidates selected from the individual methods. Each individual method performs with similar accuracy but predicts unique genes.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2654405&req=5

pcbi-1000322-g004: Individual method accuracy and overlap.Three computational methods and an ensemble of those methods were used to select candidates for experimental evaluation. Of the 183 predictions evaluated in our first iteration, 88 were chosen from the top 40 results of at least one individual method, while the remaining 95 were selected from the ensemble of all three. (A) The accuracy of the predictions chosen from each method, from genes selected by the ensemble, and the overall accuracy for all candidates tested in our first iteration. (B) Overlap between candidates selected from the individual methods. Each individual method performs with similar accuracy but predicts unique genes.
Mentions: In addition to demonstrating the accuracy of computational function prediction approaches, our results also emphasize the importance of considering the specific biological nature of predictions. Specifically, our results show that different computational approaches can produce equally accurate - but distinct - predictions depending on the algorithmic foundation and underlying data of each method. Although we did not attempt a comprehensive study of all types of computational function prediction methods, the three methods used in this study included both supervised and unsupervised approaches utilizing different data sources, and our observations are likely to be generally applicable. To demonstrate this generality, we have also analyzed additional canonical computational function prediction approaches (a Support Vector Machine (SVM) trained using only microarray data, an SVM trained using diverse data, and unsupervised correlation across microarray data). This additional analysis supports the results and conclusions presented below and is fully discussed in Text S1. Each of the three function prediction methods employed in this study achieved similarly high rates of phenotypic positives (Figure 4A). However, there was a relatively small overlap between the 40 most confident predictions of each method, as only 8% of the 88 total candidates selected from an individual method were common to all three (Figure 4B). True positive rates were similar among genes predicted confidently by only one method or by multiple methods, indicating that each computational approach was accurately predicting disparate aspects of mitochondrial organization and biogenesis. This variation can be accounted for both by differences in the underlying data and by algorithmic diversity among the computational approaches. As discussed below, such differences among methods should be carefully considered when developing new prediction techniques or applying them in a biological setting.

Bottom Line: Here we analyze and explore the results of this study that are broadly applicable for computationalists applying gene function prediction techniques, including a new experimental comparison with 48 genes representing the genomic background.Our study leads to several conclusions that are important to consider when driving laboratory investigations using computational prediction approaches.While this study focused on a specific functional area in yeast, many of these observations may be useful in the contexts of other processes and organisms.

View Article: PubMed Central - PubMed

Affiliation: Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Laboratory, Princeton, New Jersey, United States of America.

ABSTRACT
Computational approaches have promised to organize collections of functional genomics data into testable predictions of gene and protein involvement in biological processes and pathways. However, few such predictions have been experimentally validated on a large scale, leaving many bioinformatic methods unproven and underutilized in the biology community. Further, it remains unclear what biological concerns should be taken into account when using computational methods to drive real-world experimental efforts. To investigate these concerns and to establish the utility of computational predictions of gene function, we experimentally tested hundreds of predictions generated from an ensemble of three complementary methods for the process of mitochondrial organization and biogenesis in Saccharomyces cerevisiae. The biological data with respect to the mitochondria are presented in a companion manuscript published in PLoS Genetics (doi:10.1371/journal.pgen.1000407). Here we analyze and explore the results of this study that are broadly applicable for computationalists applying gene function prediction techniques, including a new experimental comparison with 48 genes representing the genomic background. Our study leads to several conclusions that are important to consider when driving laboratory investigations using computational prediction approaches. While most genes in yeast are already known to participate in at least one biological process, we confirm that genes with known functions can still be strong candidates for annotation of additional gene functions. We find that different analysis techniques and different underlying data can both greatly affect the types of functional predictions produced by computational methods. This diversity allows an ensemble of techniques to substantially broaden the biological scope and breadth of predictions. We also find that performing prediction and validation steps iteratively allows us to more completely characterize a biological area of interest. While this study focused on a specific functional area in yeast, many of these observations may be useful in the contexts of other processes and organisms.

Show MeSH