Limits...
Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae.

Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hon GC, Myers CL, Parsons A, Friesen H, Oughtred R, Tong A, Stark C, Ho Y, Botstein D, Andrews B, Boone C, Troyanskya OG, Ideker T, Dolinski K, Batada NN, Tyers M - J. Biol. (2006)

Bottom Line: Sparse coverage in HTP datasets may, however, distort network properties and confound predictions.We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications.We show that the LC dataset considerably improves the predictive power of network-analysis approaches.

View Article: PubMed Central - HTML - PubMed

Affiliation: Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto ON M5G 1X5, Canada. teresa.reguly@utoronto.ca

ABSTRACT

Background: The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference.

Results: We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID (http://www.thebiogrid.org) and SGD (http://www.yeastgenome.org/) databases.

Conclusion: Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks.

Show MeSH
Correlation of interactions with protein abundance and localization. (a) Statistical enrichment of interaction pairs as a function of protein abundance for each indicated dataset. Protein or gene pairs were separated into bins representing increasing protein abundance as derived from a genome-wide analysis [67] and shaded according to enrichment over chance distribution (the scale bar indicates the fraction of total interactions, with lighter regions indicating enrichment). Inf indicates infinity. Raw abundance distributions in each dataset are provided in Additional data file 3. (b) Correlation ratios of interactions between proteins of different locality for LC-PI and LC-GI networks. Blue regions in the diagonal indicate that interactions within the locality group are enhanced, while the off-diagonal red regions indicate that interactions of proteins from different localities are suppressed. Nodes with multiple localities were treated as missing values. Proteome-wide localization annotation [68] was available for 1,404 proteins (around 52%) in the LC dataset. The expected number of interactions was generated using 200 iterations of randomized versions of both original networks. Random networks were generated by an edge-swapping procedure, which maintains the degree-distribution, and localization assignments were shuffled among those nodes that had a single locality (the scale bar indicates fold enrichment over chance).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1561585&req=5

Figure 8: Correlation of interactions with protein abundance and localization. (a) Statistical enrichment of interaction pairs as a function of protein abundance for each indicated dataset. Protein or gene pairs were separated into bins representing increasing protein abundance as derived from a genome-wide analysis [67] and shaded according to enrichment over chance distribution (the scale bar indicates the fraction of total interactions, with lighter regions indicating enrichment). Inf indicates infinity. Raw abundance distributions in each dataset are provided in Additional data file 3. (b) Correlation ratios of interactions between proteins of different locality for LC-PI and LC-GI networks. Blue regions in the diagonal indicate that interactions within the locality group are enhanced, while the off-diagonal red regions indicate that interactions of proteins from different localities are suppressed. Nodes with multiple localities were treated as missing values. Proteome-wide localization annotation [68] was available for 1,404 proteins (around 52%) in the LC dataset. The expected number of interactions was generated using 200 iterations of randomized versions of both original networks. Random networks were generated by an edge-swapping procedure, which maintains the degree-distribution, and localization assignments were shuffled among those nodes that had a single locality (the scale bar indicates fold enrichment over chance).

Mentions: The abundance of most predicted proteins in yeast has recently been determined [67]. Comparison of this dataset with all protein- and genetic-interaction datasets revealed that highly abundant proteins were more likely to exhibit detectable physical interactions, whereas low-abundance proteins were more likely to exhibit genetic interactions (Figure 8a). Both LC-PI and HTP-PI datasets exhibited a significant positive bias towards abundant proteins (r = 0.06, P = 0.0025 and r = 0.19, P = 2 × 10-26 respectively, Spearman rank correlation), while LC-GI and HTP-GI exhibited a significant but weak negative bias (r = -0.06, P = 0.005 and r = -0.11, P = 9 × 10-4 respectively, Spearman rank correlation). Interestingly, despite a stronger overall negative correlation with protein abundance, the systematic genetic analyses in the HTP-GI dataset were more uniformly distributed across protein-abundance bins, whereas the LC-GI interactions were more strongly represented in the lowest-abundance bins. This latter observation suggests that the phenotypes studied by conventional genetics may be focused on regulatory processes controlled by low-abundance proteins.


Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae.

Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hon GC, Myers CL, Parsons A, Friesen H, Oughtred R, Tong A, Stark C, Ho Y, Botstein D, Andrews B, Boone C, Troyanskya OG, Ideker T, Dolinski K, Batada NN, Tyers M - J. Biol. (2006)

Correlation of interactions with protein abundance and localization. (a) Statistical enrichment of interaction pairs as a function of protein abundance for each indicated dataset. Protein or gene pairs were separated into bins representing increasing protein abundance as derived from a genome-wide analysis [67] and shaded according to enrichment over chance distribution (the scale bar indicates the fraction of total interactions, with lighter regions indicating enrichment). Inf indicates infinity. Raw abundance distributions in each dataset are provided in Additional data file 3. (b) Correlation ratios of interactions between proteins of different locality for LC-PI and LC-GI networks. Blue regions in the diagonal indicate that interactions within the locality group are enhanced, while the off-diagonal red regions indicate that interactions of proteins from different localities are suppressed. Nodes with multiple localities were treated as missing values. Proteome-wide localization annotation [68] was available for 1,404 proteins (around 52%) in the LC dataset. The expected number of interactions was generated using 200 iterations of randomized versions of both original networks. Random networks were generated by an edge-swapping procedure, which maintains the degree-distribution, and localization assignments were shuffled among those nodes that had a single locality (the scale bar indicates fold enrichment over chance).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1561585&req=5

Figure 8: Correlation of interactions with protein abundance and localization. (a) Statistical enrichment of interaction pairs as a function of protein abundance for each indicated dataset. Protein or gene pairs were separated into bins representing increasing protein abundance as derived from a genome-wide analysis [67] and shaded according to enrichment over chance distribution (the scale bar indicates the fraction of total interactions, with lighter regions indicating enrichment). Inf indicates infinity. Raw abundance distributions in each dataset are provided in Additional data file 3. (b) Correlation ratios of interactions between proteins of different locality for LC-PI and LC-GI networks. Blue regions in the diagonal indicate that interactions within the locality group are enhanced, while the off-diagonal red regions indicate that interactions of proteins from different localities are suppressed. Nodes with multiple localities were treated as missing values. Proteome-wide localization annotation [68] was available for 1,404 proteins (around 52%) in the LC dataset. The expected number of interactions was generated using 200 iterations of randomized versions of both original networks. Random networks were generated by an edge-swapping procedure, which maintains the degree-distribution, and localization assignments were shuffled among those nodes that had a single locality (the scale bar indicates fold enrichment over chance).
Mentions: The abundance of most predicted proteins in yeast has recently been determined [67]. Comparison of this dataset with all protein- and genetic-interaction datasets revealed that highly abundant proteins were more likely to exhibit detectable physical interactions, whereas low-abundance proteins were more likely to exhibit genetic interactions (Figure 8a). Both LC-PI and HTP-PI datasets exhibited a significant positive bias towards abundant proteins (r = 0.06, P = 0.0025 and r = 0.19, P = 2 × 10-26 respectively, Spearman rank correlation), while LC-GI and HTP-GI exhibited a significant but weak negative bias (r = -0.06, P = 0.005 and r = -0.11, P = 9 × 10-4 respectively, Spearman rank correlation). Interestingly, despite a stronger overall negative correlation with protein abundance, the systematic genetic analyses in the HTP-GI dataset were more uniformly distributed across protein-abundance bins, whereas the LC-GI interactions were more strongly represented in the lowest-abundance bins. This latter observation suggests that the phenotypes studied by conventional genetics may be focused on regulatory processes controlled by low-abundance proteins.

Bottom Line: Sparse coverage in HTP datasets may, however, distort network properties and confound predictions.We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications.We show that the LC dataset considerably improves the predictive power of network-analysis approaches.

View Article: PubMed Central - HTML - PubMed

Affiliation: Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto ON M5G 1X5, Canada. teresa.reguly@utoronto.ca

ABSTRACT

Background: The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference.

Results: We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID (http://www.thebiogrid.org) and SGD (http://www.yeastgenome.org/) databases.

Conclusion: Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks.

Show MeSH