Limits...
An improved, bias-reduced probabilistic functional gene network of baker's yeast, Saccharomyces cerevisiae.

Lee I, Li Z, Marcotte EM - PLoS ONE (2007)

Bottom Line: We report a significantly improved version (v. 2) of a probabilistic functional gene network of the baker's yeast, Saccharomyces cerevisiae.Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis.YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome).

View Article: PubMed Central - PubMed

Affiliation: Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, United States of America.

ABSTRACT

Background: Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA microarray experiments, physical protein interactions, genetic interactions, and comparative genomics; the resulting networks can then be easily applied to generate testable hypotheses regarding specific gene functions and associations.

Methodology/principal findings: We report a significantly improved version (v. 2) of a probabilistic functional gene network of the baker's yeast, Saccharomyces cerevisiae. We describe our optimization methods and illustrate their effects in three major areas: the reduction of functional bias in network training reference sets, the application of a probabilistic model for calculating confidences in pair-wise protein physical or genetic interactions, and the introduction of simple thresholds that eliminate many false positive mRNA co-expression relationships. Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis.

Conclusions/significance: YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome). YeastNet is available from http://www.yeastnet.org.

Show MeSH
Assigning confidence scores to physical or genetic interactions.Performance of the hypergeometric probabilistic score is shown for gene functional associations inferred from (A) protein-protein physical interactions measured by the high-throughput yeast two hybrid (Y2H) screen of Ito et al. [20], (B) affinity-purified complexes identified by mass spectrometry by Gavin et al. [52], and (C) genetic interactions [43], [74]. Performance with the probability score is measured cumulatively for each successive bin of 200 interactions (A–C, red filled triangles), ranked by probability score. Recall and precision are calculated using the reference linkages derived from Gene Ontology “biological process” annotation masking the term “protein biosynthesis”. The Y2H core model described in [20] (A, filled circle) is more precise than the complete data set (A, open circle), but with reduced recall. Similarly, two different ways of inferring binary linkages from mass spectrometry-derived protein complexes [22]—the spoke (B, filled circle) and matrix models (B, open circle)—show differing trade-offs between precision and recall. The set of binary genetic interactions (C, open circle) shows very low precision for functional inferences, although the false positive rate of genetic interactions is generally perceived to be low; in contrast, the hypergeometric probability identifies a functionally informative subset of linkages. In general, the hypergeometric probability scores provide an excellent ranking of interactions in each of the data sets consistent with the linkages' functional informativeness.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC1991590&req=5

pone-0000988-g002: Assigning confidence scores to physical or genetic interactions.Performance of the hypergeometric probabilistic score is shown for gene functional associations inferred from (A) protein-protein physical interactions measured by the high-throughput yeast two hybrid (Y2H) screen of Ito et al. [20], (B) affinity-purified complexes identified by mass spectrometry by Gavin et al. [52], and (C) genetic interactions [43], [74]. Performance with the probability score is measured cumulatively for each successive bin of 200 interactions (A–C, red filled triangles), ranked by probability score. Recall and precision are calculated using the reference linkages derived from Gene Ontology “biological process” annotation masking the term “protein biosynthesis”. The Y2H core model described in [20] (A, filled circle) is more precise than the complete data set (A, open circle), but with reduced recall. Similarly, two different ways of inferring binary linkages from mass spectrometry-derived protein complexes [22]—the spoke (B, filled circle) and matrix models (B, open circle)—show differing trade-offs between precision and recall. The set of binary genetic interactions (C, open circle) shows very low precision for functional inferences, although the false positive rate of genetic interactions is generally perceived to be low; in contrast, the hypergeometric probability identifies a functionally informative subset of linkages. In general, the hypergeometric probability scores provide an excellent ranking of interactions in each of the data sets consistent with the linkages' functional informativeness.

Mentions: A probabilistic model of protein-protein interactions should bypass the limitations of these coarse descriptive models, while providing higher resolution scoring important for data integration. We found that calculating the hypergeometric probability of the protein interactions occurring at random chance in a given data set generates a very well-behaved ranking of interaction accuracy in recall-precision analyses (Figure 2A&B). Note that this approach does not require training—instead, confidence is based only upon observations in the experiment under analysis and reflects the specificity with which a particular protein pair interacts, down-weighting promiscuous interactors and rewarding well-observed specific interactions. This scoring scheme outperforms the spoke model and attaches confidence values to each interaction in the matrix model, thereby separating high and low confidence matrix model interactions (Figure 2A&B). The hypergeometric score appears to work equally effectively for yeast two-hybrid and mass spectrometry interactions.


An improved, bias-reduced probabilistic functional gene network of baker's yeast, Saccharomyces cerevisiae.

Lee I, Li Z, Marcotte EM - PLoS ONE (2007)

Assigning confidence scores to physical or genetic interactions.Performance of the hypergeometric probabilistic score is shown for gene functional associations inferred from (A) protein-protein physical interactions measured by the high-throughput yeast two hybrid (Y2H) screen of Ito et al. [20], (B) affinity-purified complexes identified by mass spectrometry by Gavin et al. [52], and (C) genetic interactions [43], [74]. Performance with the probability score is measured cumulatively for each successive bin of 200 interactions (A–C, red filled triangles), ranked by probability score. Recall and precision are calculated using the reference linkages derived from Gene Ontology “biological process” annotation masking the term “protein biosynthesis”. The Y2H core model described in [20] (A, filled circle) is more precise than the complete data set (A, open circle), but with reduced recall. Similarly, two different ways of inferring binary linkages from mass spectrometry-derived protein complexes [22]—the spoke (B, filled circle) and matrix models (B, open circle)—show differing trade-offs between precision and recall. The set of binary genetic interactions (C, open circle) shows very low precision for functional inferences, although the false positive rate of genetic interactions is generally perceived to be low; in contrast, the hypergeometric probability identifies a functionally informative subset of linkages. In general, the hypergeometric probability scores provide an excellent ranking of interactions in each of the data sets consistent with the linkages' functional informativeness.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC1991590&req=5

pone-0000988-g002: Assigning confidence scores to physical or genetic interactions.Performance of the hypergeometric probabilistic score is shown for gene functional associations inferred from (A) protein-protein physical interactions measured by the high-throughput yeast two hybrid (Y2H) screen of Ito et al. [20], (B) affinity-purified complexes identified by mass spectrometry by Gavin et al. [52], and (C) genetic interactions [43], [74]. Performance with the probability score is measured cumulatively for each successive bin of 200 interactions (A–C, red filled triangles), ranked by probability score. Recall and precision are calculated using the reference linkages derived from Gene Ontology “biological process” annotation masking the term “protein biosynthesis”. The Y2H core model described in [20] (A, filled circle) is more precise than the complete data set (A, open circle), but with reduced recall. Similarly, two different ways of inferring binary linkages from mass spectrometry-derived protein complexes [22]—the spoke (B, filled circle) and matrix models (B, open circle)—show differing trade-offs between precision and recall. The set of binary genetic interactions (C, open circle) shows very low precision for functional inferences, although the false positive rate of genetic interactions is generally perceived to be low; in contrast, the hypergeometric probability identifies a functionally informative subset of linkages. In general, the hypergeometric probability scores provide an excellent ranking of interactions in each of the data sets consistent with the linkages' functional informativeness.
Mentions: A probabilistic model of protein-protein interactions should bypass the limitations of these coarse descriptive models, while providing higher resolution scoring important for data integration. We found that calculating the hypergeometric probability of the protein interactions occurring at random chance in a given data set generates a very well-behaved ranking of interaction accuracy in recall-precision analyses (Figure 2A&B). Note that this approach does not require training—instead, confidence is based only upon observations in the experiment under analysis and reflects the specificity with which a particular protein pair interacts, down-weighting promiscuous interactors and rewarding well-observed specific interactions. This scoring scheme outperforms the spoke model and attaches confidence values to each interaction in the matrix model, thereby separating high and low confidence matrix model interactions (Figure 2A&B). The hypergeometric score appears to work equally effectively for yeast two-hybrid and mass spectrometry interactions.

Bottom Line: We report a significantly improved version (v. 2) of a probabilistic functional gene network of the baker's yeast, Saccharomyces cerevisiae.Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis.YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome).

View Article: PubMed Central - PubMed

Affiliation: Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, United States of America.

ABSTRACT

Background: Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA microarray experiments, physical protein interactions, genetic interactions, and comparative genomics; the resulting networks can then be easily applied to generate testable hypotheses regarding specific gene functions and associations.

Methodology/principal findings: We report a significantly improved version (v. 2) of a probabilistic functional gene network of the baker's yeast, Saccharomyces cerevisiae. We describe our optimization methods and illustrate their effects in three major areas: the reduction of functional bias in network training reference sets, the application of a probabilistic model for calculating confidences in pair-wise protein physical or genetic interactions, and the introduction of simple thresholds that eliminate many false positive mRNA co-expression relationships. Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis.

Conclusions/significance: YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome). YeastNet is available from http://www.yeastnet.org.

Show MeSH