Limits...
An improved, bias-reduced probabilistic functional gene network of baker's yeast, Saccharomyces cerevisiae.

Lee I, Li Z, Marcotte EM - PLoS ONE (2007)

Bottom Line: We report a significantly improved version (v. 2) of a probabilistic functional gene network of the baker's yeast, Saccharomyces cerevisiae.Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis.YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome).

View Article: PubMed Central - PubMed

Affiliation: Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, United States of America.

ABSTRACT

Background: Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA microarray experiments, physical protein interactions, genetic interactions, and comparative genomics; the resulting networks can then be easily applied to generate testable hypotheses regarding specific gene functions and associations.

Methodology/principal findings: We report a significantly improved version (v. 2) of a probabilistic functional gene network of the baker's yeast, Saccharomyces cerevisiae. We describe our optimization methods and illustrate their effects in three major areas: the reduction of functional bias in network training reference sets, the application of a probabilistic model for calculating confidences in pair-wise protein physical or genetic interactions, and the introduction of simple thresholds that eliminate many false positive mRNA co-expression relationships. Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis.

Conclusions/significance: YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome). YeastNet is available from http://www.yeastnet.org.

Show MeSH
The effect of functionally biased Gene Ontology annotation on network training.(A) Frequency histograms of the usage of 1,067 Gene Ontology “biological process” annotations, ranked by the number of genes annotated with each term (black bars) and by the number of reference linkages derived using that term (white bars). Functional annotation is highly biased towards genes with the term “protein biosynthesis”. This functional bias becomes more severe in the reference linkages, given the combinatorial increase after linking all genes sharing a given term. As a result, linkages among protein biosynthesis genes compose >27% of total reference linkages. By contrast, the second most frequent term accounts for <5% of total reference linkages. (B) The likelihood of functional association between genes on the basis of the co-expression of their mRNAs across DNA microarray experiments (here, following heat-shock [19]) is significantly affected by the dominant reference term “protein biosynthesis”. For example, for the 1,000 most strongly co-expressed gene pairs, the likelihood of functional association between co-expressed genes is ∼30 fold higher than random chance (LLS∼3.4) (empty circles), but drops to ∼6 fold (LLS∼1.8) after masking the term “protein biosynthesis” in the reference set (filled circles). Thus, the high likelihood score from the biased reference set cannot be generalized to other functions. The black and red lines indicate sigmoid curve fits to the unbiased and biased reference analyses, respectively.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC1991590&req=5

pone-0000988-g001: The effect of functionally biased Gene Ontology annotation on network training.(A) Frequency histograms of the usage of 1,067 Gene Ontology “biological process” annotations, ranked by the number of genes annotated with each term (black bars) and by the number of reference linkages derived using that term (white bars). Functional annotation is highly biased towards genes with the term “protein biosynthesis”. This functional bias becomes more severe in the reference linkages, given the combinatorial increase after linking all genes sharing a given term. As a result, linkages among protein biosynthesis genes compose >27% of total reference linkages. By contrast, the second most frequent term accounts for <5% of total reference linkages. (B) The likelihood of functional association between genes on the basis of the co-expression of their mRNAs across DNA microarray experiments (here, following heat-shock [19]) is significantly affected by the dominant reference term “protein biosynthesis”. For example, for the 1,000 most strongly co-expressed gene pairs, the likelihood of functional association between co-expressed genes is ∼30 fold higher than random chance (LLS∼3.4) (empty circles), but drops to ∼6 fold (LLS∼1.8) after masking the term “protein biosynthesis” in the reference set (filled circles). Thus, the high likelihood score from the biased reference set cannot be generalized to other functions. The black and red lines indicate sigmoid curve fits to the unbiased and biased reference analyses, respectively.

Mentions: The most comprehensive and reliable functional annotation currently available for yeast is the Gene Ontology [17] annotation set. More than 70% of validated yeast protein-encoding genes are annotated by at least one of over 1,000 Gene Ontology “biological process” terms with support derived from reliable small-scale experimental evidence. Therefore, yeast Gene Ontology “biological process” annotation meets the first two requirements of a good reference set for efficient learning. However, the frequency distribution of annotation terms is heavily biased toward the single term “protein biosynthesis” (GO:0006412). This term alone is responsible for >27% of the total reference gene pairs (Figure 1A). We observed a similar bias in another widely used annotation set, The Kyoto Encyclopedia of Genes and Genomes (KEGG) [18] (data not shown).


An improved, bias-reduced probabilistic functional gene network of baker's yeast, Saccharomyces cerevisiae.

Lee I, Li Z, Marcotte EM - PLoS ONE (2007)

The effect of functionally biased Gene Ontology annotation on network training.(A) Frequency histograms of the usage of 1,067 Gene Ontology “biological process” annotations, ranked by the number of genes annotated with each term (black bars) and by the number of reference linkages derived using that term (white bars). Functional annotation is highly biased towards genes with the term “protein biosynthesis”. This functional bias becomes more severe in the reference linkages, given the combinatorial increase after linking all genes sharing a given term. As a result, linkages among protein biosynthesis genes compose >27% of total reference linkages. By contrast, the second most frequent term accounts for <5% of total reference linkages. (B) The likelihood of functional association between genes on the basis of the co-expression of their mRNAs across DNA microarray experiments (here, following heat-shock [19]) is significantly affected by the dominant reference term “protein biosynthesis”. For example, for the 1,000 most strongly co-expressed gene pairs, the likelihood of functional association between co-expressed genes is ∼30 fold higher than random chance (LLS∼3.4) (empty circles), but drops to ∼6 fold (LLS∼1.8) after masking the term “protein biosynthesis” in the reference set (filled circles). Thus, the high likelihood score from the biased reference set cannot be generalized to other functions. The black and red lines indicate sigmoid curve fits to the unbiased and biased reference analyses, respectively.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC1991590&req=5

pone-0000988-g001: The effect of functionally biased Gene Ontology annotation on network training.(A) Frequency histograms of the usage of 1,067 Gene Ontology “biological process” annotations, ranked by the number of genes annotated with each term (black bars) and by the number of reference linkages derived using that term (white bars). Functional annotation is highly biased towards genes with the term “protein biosynthesis”. This functional bias becomes more severe in the reference linkages, given the combinatorial increase after linking all genes sharing a given term. As a result, linkages among protein biosynthesis genes compose >27% of total reference linkages. By contrast, the second most frequent term accounts for <5% of total reference linkages. (B) The likelihood of functional association between genes on the basis of the co-expression of their mRNAs across DNA microarray experiments (here, following heat-shock [19]) is significantly affected by the dominant reference term “protein biosynthesis”. For example, for the 1,000 most strongly co-expressed gene pairs, the likelihood of functional association between co-expressed genes is ∼30 fold higher than random chance (LLS∼3.4) (empty circles), but drops to ∼6 fold (LLS∼1.8) after masking the term “protein biosynthesis” in the reference set (filled circles). Thus, the high likelihood score from the biased reference set cannot be generalized to other functions. The black and red lines indicate sigmoid curve fits to the unbiased and biased reference analyses, respectively.
Mentions: The most comprehensive and reliable functional annotation currently available for yeast is the Gene Ontology [17] annotation set. More than 70% of validated yeast protein-encoding genes are annotated by at least one of over 1,000 Gene Ontology “biological process” terms with support derived from reliable small-scale experimental evidence. Therefore, yeast Gene Ontology “biological process” annotation meets the first two requirements of a good reference set for efficient learning. However, the frequency distribution of annotation terms is heavily biased toward the single term “protein biosynthesis” (GO:0006412). This term alone is responsible for >27% of the total reference gene pairs (Figure 1A). We observed a similar bias in another widely used annotation set, The Kyoto Encyclopedia of Genes and Genomes (KEGG) [18] (data not shown).

Bottom Line: We report a significantly improved version (v. 2) of a probabilistic functional gene network of the baker's yeast, Saccharomyces cerevisiae.Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis.YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome).

View Article: PubMed Central - PubMed

Affiliation: Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, United States of America.

ABSTRACT

Background: Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA microarray experiments, physical protein interactions, genetic interactions, and comparative genomics; the resulting networks can then be easily applied to generate testable hypotheses regarding specific gene functions and associations.

Methodology/principal findings: We report a significantly improved version (v. 2) of a probabilistic functional gene network of the baker's yeast, Saccharomyces cerevisiae. We describe our optimization methods and illustrate their effects in three major areas: the reduction of functional bias in network training reference sets, the application of a probabilistic model for calculating confidences in pair-wise protein physical or genetic interactions, and the introduction of simple thresholds that eliminate many false positive mRNA co-expression relationships. Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis.

Conclusions/significance: YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome). YeastNet is available from http://www.yeastnet.org.

Show MeSH