Limits...
An improved, bias-reduced probabilistic functional gene network of baker's yeast, Saccharomyces cerevisiae.

Lee I, Li Z, Marcotte EM - PLoS ONE (2007)

Bottom Line: We report a significantly improved version (v. 2) of a probabilistic functional gene network of the baker's yeast, Saccharomyces cerevisiae.Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis.YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome).

View Article: PubMed Central - PubMed

Affiliation: Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, United States of America.

ABSTRACT

Background: Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA microarray experiments, physical protein interactions, genetic interactions, and comparative genomics; the resulting networks can then be easily applied to generate testable hypotheses regarding specific gene functions and associations.

Methodology/principal findings: We report a significantly improved version (v. 2) of a probabilistic functional gene network of the baker's yeast, Saccharomyces cerevisiae. We describe our optimization methods and illustrate their effects in three major areas: the reduction of functional bias in network training reference sets, the application of a probabilistic model for calculating confidences in pair-wise protein physical or genetic interactions, and the introduction of simple thresholds that eliminate many false positive mRNA co-expression relationships. Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis.

Conclusions/significance: YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome). YeastNet is available from http://www.yeastnet.org.

Show MeSH

Related in: MedlinePlus

Optimizing the inference of linkages from mRNA co-expression.(A) Examples of a functionally informative DNA microarray data set and a non-informative one. Each set is illustrated as a scatter plot showing the log likelihood of functional association for each successive bin of 1,000 gene pairs (circles) ranked by decreasing Pearson correlation coefficient between expression vectors derived from that array set. The set of microarray data measuring oxidative stress responses following Menadione treatment [75] (filled circles) does not show a significant relationship between co-expression and the likelihood of functional association. In contrast, the set of cell cycle time course experiments [76] (open circles) shows a strong relationship. The effect of filtering genes using the parameters M and R is illustrated in (B). A data set of genes changing expression during the diauxic shift [77] (open circles) shows a noisy relationship between co-expression and the likelihood of functional association, especially for gene pairs with the highest Pearson correlation coefficients. However, by introducing the two threshold parameters, the relationship improves (filled circles), in particular decreasing variance considerably and improving the corresponding regression model. (C) The divide-test-integrate strategy [1] for inferring linkages, shown here calculated across all 500 microarray experiments (empty triangles) considerably outperforms analysis of the expression vectors constructed by concatenating the 500 experiments (filled circles). Precision is measured using reference linkages derived from MIPS functional annotation, masking the term “protein synthesis”, and recall is calculated for either reference linkages or total yeast genes (inset).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC1991590&req=5

pone-0000988-g003: Optimizing the inference of linkages from mRNA co-expression.(A) Examples of a functionally informative DNA microarray data set and a non-informative one. Each set is illustrated as a scatter plot showing the log likelihood of functional association for each successive bin of 1,000 gene pairs (circles) ranked by decreasing Pearson correlation coefficient between expression vectors derived from that array set. The set of microarray data measuring oxidative stress responses following Menadione treatment [75] (filled circles) does not show a significant relationship between co-expression and the likelihood of functional association. In contrast, the set of cell cycle time course experiments [76] (open circles) shows a strong relationship. The effect of filtering genes using the parameters M and R is illustrated in (B). A data set of genes changing expression during the diauxic shift [77] (open circles) shows a noisy relationship between co-expression and the likelihood of functional association, especially for gene pairs with the highest Pearson correlation coefficients. However, by introducing the two threshold parameters, the relationship improves (filled circles), in particular decreasing variance considerably and improving the corresponding regression model. (C) The divide-test-integrate strategy [1] for inferring linkages, shown here calculated across all 500 microarray experiments (empty triangles) considerably outperforms analysis of the expression vectors constructed by concatenating the 500 experiments (filled circles). Precision is measured using reference linkages derived from MIPS functional annotation, masking the term “protein synthesis”, and recall is calculated for either reference linkages or total yeast genes (inset).

Mentions: Beyond filtering genes, we also removed entire data sets that proved uninformative for reconstructing a functional network: We measured the relationship between the degree of co-expression between two genes, measured as the Pearson correlation coefficient (PCC) of their expression levels across the arrays under consideration, and the likelihood of their functional association, measured by the log likelihood of belonging to the same pathway (LLS, see Methods) between the genes in each successive bin of 1000 gene pairs ranked in descending order by PCC. Across 18 total sets of DNA microarrays from SMD [26], containing 581 individual array experiments, we found 14 sets showed a significant relationship (e.g., Cell cycle; Figure 3A) and 4 sets showed no relationship (e.g., Oxidative stress with Menadione; Figure 3A), as listed in Tables 2 and 3. Alternate measures of expression correlation (the non-parametric Spearman rank coefficient and mutual information measures) failed to improve performance over PCC. Filtering the unresponsive genes as described above further improved the relationships, as shown for an example in Figure 3B. In order to ensure representation of housekeeping genes, the 14 informative array sets were also concatenated into monolithic expression vectors spanning 500 experiments and analyzed for co-expression linkages as above. The benefits of the divide-test-integrate method are illustrated in the improved precision for any given coverage of genes or reference linkages, as shown in Figure 3C on the independent MIPS protein functional linkage reference set (excluding the term “protein synthesis”).


An improved, bias-reduced probabilistic functional gene network of baker's yeast, Saccharomyces cerevisiae.

Lee I, Li Z, Marcotte EM - PLoS ONE (2007)

Optimizing the inference of linkages from mRNA co-expression.(A) Examples of a functionally informative DNA microarray data set and a non-informative one. Each set is illustrated as a scatter plot showing the log likelihood of functional association for each successive bin of 1,000 gene pairs (circles) ranked by decreasing Pearson correlation coefficient between expression vectors derived from that array set. The set of microarray data measuring oxidative stress responses following Menadione treatment [75] (filled circles) does not show a significant relationship between co-expression and the likelihood of functional association. In contrast, the set of cell cycle time course experiments [76] (open circles) shows a strong relationship. The effect of filtering genes using the parameters M and R is illustrated in (B). A data set of genes changing expression during the diauxic shift [77] (open circles) shows a noisy relationship between co-expression and the likelihood of functional association, especially for gene pairs with the highest Pearson correlation coefficients. However, by introducing the two threshold parameters, the relationship improves (filled circles), in particular decreasing variance considerably and improving the corresponding regression model. (C) The divide-test-integrate strategy [1] for inferring linkages, shown here calculated across all 500 microarray experiments (empty triangles) considerably outperforms analysis of the expression vectors constructed by concatenating the 500 experiments (filled circles). Precision is measured using reference linkages derived from MIPS functional annotation, masking the term “protein synthesis”, and recall is calculated for either reference linkages or total yeast genes (inset).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC1991590&req=5

pone-0000988-g003: Optimizing the inference of linkages from mRNA co-expression.(A) Examples of a functionally informative DNA microarray data set and a non-informative one. Each set is illustrated as a scatter plot showing the log likelihood of functional association for each successive bin of 1,000 gene pairs (circles) ranked by decreasing Pearson correlation coefficient between expression vectors derived from that array set. The set of microarray data measuring oxidative stress responses following Menadione treatment [75] (filled circles) does not show a significant relationship between co-expression and the likelihood of functional association. In contrast, the set of cell cycle time course experiments [76] (open circles) shows a strong relationship. The effect of filtering genes using the parameters M and R is illustrated in (B). A data set of genes changing expression during the diauxic shift [77] (open circles) shows a noisy relationship between co-expression and the likelihood of functional association, especially for gene pairs with the highest Pearson correlation coefficients. However, by introducing the two threshold parameters, the relationship improves (filled circles), in particular decreasing variance considerably and improving the corresponding regression model. (C) The divide-test-integrate strategy [1] for inferring linkages, shown here calculated across all 500 microarray experiments (empty triangles) considerably outperforms analysis of the expression vectors constructed by concatenating the 500 experiments (filled circles). Precision is measured using reference linkages derived from MIPS functional annotation, masking the term “protein synthesis”, and recall is calculated for either reference linkages or total yeast genes (inset).
Mentions: Beyond filtering genes, we also removed entire data sets that proved uninformative for reconstructing a functional network: We measured the relationship between the degree of co-expression between two genes, measured as the Pearson correlation coefficient (PCC) of their expression levels across the arrays under consideration, and the likelihood of their functional association, measured by the log likelihood of belonging to the same pathway (LLS, see Methods) between the genes in each successive bin of 1000 gene pairs ranked in descending order by PCC. Across 18 total sets of DNA microarrays from SMD [26], containing 581 individual array experiments, we found 14 sets showed a significant relationship (e.g., Cell cycle; Figure 3A) and 4 sets showed no relationship (e.g., Oxidative stress with Menadione; Figure 3A), as listed in Tables 2 and 3. Alternate measures of expression correlation (the non-parametric Spearman rank coefficient and mutual information measures) failed to improve performance over PCC. Filtering the unresponsive genes as described above further improved the relationships, as shown for an example in Figure 3B. In order to ensure representation of housekeeping genes, the 14 informative array sets were also concatenated into monolithic expression vectors spanning 500 experiments and analyzed for co-expression linkages as above. The benefits of the divide-test-integrate method are illustrated in the improved precision for any given coverage of genes or reference linkages, as shown in Figure 3C on the independent MIPS protein functional linkage reference set (excluding the term “protein synthesis”).

Bottom Line: We report a significantly improved version (v. 2) of a probabilistic functional gene network of the baker's yeast, Saccharomyces cerevisiae.Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis.YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome).

View Article: PubMed Central - PubMed

Affiliation: Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, United States of America.

ABSTRACT

Background: Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA microarray experiments, physical protein interactions, genetic interactions, and comparative genomics; the resulting networks can then be easily applied to generate testable hypotheses regarding specific gene functions and associations.

Methodology/principal findings: We report a significantly improved version (v. 2) of a probabilistic functional gene network of the baker's yeast, Saccharomyces cerevisiae. We describe our optimization methods and illustrate their effects in three major areas: the reduction of functional bias in network training reference sets, the application of a probabilistic model for calculating confidences in pair-wise protein physical or genetic interactions, and the introduction of simple thresholds that eliminate many false positive mRNA co-expression relationships. Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis.

Conclusions/significance: YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome). YeastNet is available from http://www.yeastnet.org.

Show MeSH
Related in: MedlinePlus