Limits...
Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae.

Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hon GC, Myers CL, Parsons A, Friesen H, Oughtred R, Tong A, Stark C, Ho Y, Botstein D, Andrews B, Boone C, Troyanskya OG, Ideker T, Dolinski K, Batada NN, Tyers M - J. Biol. (2006)

Bottom Line: Sparse coverage in HTP datasets may, however, distort network properties and confound predictions.We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications.We show that the LC dataset considerably improves the predictive power of network-analysis approaches.

View Article: PubMed Central - HTML - PubMed

Affiliation: Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto ON M5G 1X5, Canada. teresa.reguly@utoronto.ca

ABSTRACT

Background: The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference.

Results: We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID (http://www.thebiogrid.org) and SGD (http://www.yeastgenome.org/) databases.

Conclusion: Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks.

Show MeSH
Overlap of physical and genetic interaction pairs. (a) Overlap between LC-PI and LC-GI datasets. (b) Overlap between HTP-PI and HTP-GI datasets. (c) Overlap between LC-PI and HTP-GI datasets. (d) Overlap between LC-GI and HTP-PI datasets.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1561585&req=5

Figure 7: Overlap of physical and genetic interaction pairs. (a) Overlap between LC-PI and LC-GI datasets. (b) Overlap between HTP-PI and HTP-GI datasets. (c) Overlap between LC-PI and HTP-GI datasets. (d) Overlap between LC-GI and HTP-PI datasets.

Mentions: Protein interactions by definition represent connections within complexes or along pathways, whereas genetic interactions typically represent functional connections of one sort or another between pathways [4,12,64]. We used the Osprey visualization tool [65] to represent and overlay protein- and genetic-interaction networks for the LC and HTP datasets. Given the perceived orthogonality of physical and genetic interaction space based on HTP studies [12], the LC-PI and LC-GI networks exhibited an unexpectedly high degree of overlap, at 12% of all protein interactions and 17% of all genetic interactions (Figure 7a). Of the 1,409 overlap pairs, 442 corresponded to interactions between essential proteins, while an additional 488 corresponded to interactions between an essential and a nonessential protein. The essential gene or protein content of the overlapping set of interactions was not substantially different from the input LC-PI and LC-GI datasets, nor was there pronounced enrichment or depletion for synthetic lethality or any other type of genetic interaction in the overlap dataset (see Additional data file 1). In striking contrast, overlap between the HTP-PI and HTP-GI networks was virtually nonexistent (Figure 7b), as has been previously noted [12]. This minimal overlap was due to the properties of the HTP-GI network, as the HTP-GI overlap with LC-PI was also minimal (Figure 7c), whereas the overlap between HTP-PI and LC-GI was significant (Figure 7d). Because essential genes were not enriched in the LC-PI/LC-GI overlap set, the under-representation of essential genes in the HTP-GI network [10,12,13] cannot explain the minimal overlap of HTP-GI with the LC-PI and HTP-PI networks (Figure 7b,c). It has been noted that proteins that exhibit more physical interactions tend also to exhibit more genetic interactions [66]. Indeed, the average number of physical connections for the nodes in the LC-PI/LC-GI overlap set was 7.7, compared with 3.2 for the remainder of the nodes in LC-PI. This feature does not, however, explain the discrepancy between the LC-GI and HTP-GI datasets because both had very similar physical connectivity distributions. Interestingly, half (706 of 1,409) of the interactions that do overlap in the LC-PI and LC-GI datasets mapped back to the same publication as each other, suggesting that investigators may often test specific interactions in order to support initial observations. This bias may help drive overlap between the LC-PI and LC-GI datasets.


Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae.

Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hon GC, Myers CL, Parsons A, Friesen H, Oughtred R, Tong A, Stark C, Ho Y, Botstein D, Andrews B, Boone C, Troyanskya OG, Ideker T, Dolinski K, Batada NN, Tyers M - J. Biol. (2006)

Overlap of physical and genetic interaction pairs. (a) Overlap between LC-PI and LC-GI datasets. (b) Overlap between HTP-PI and HTP-GI datasets. (c) Overlap between LC-PI and HTP-GI datasets. (d) Overlap between LC-GI and HTP-PI datasets.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1561585&req=5

Figure 7: Overlap of physical and genetic interaction pairs. (a) Overlap between LC-PI and LC-GI datasets. (b) Overlap between HTP-PI and HTP-GI datasets. (c) Overlap between LC-PI and HTP-GI datasets. (d) Overlap between LC-GI and HTP-PI datasets.
Mentions: Protein interactions by definition represent connections within complexes or along pathways, whereas genetic interactions typically represent functional connections of one sort or another between pathways [4,12,64]. We used the Osprey visualization tool [65] to represent and overlay protein- and genetic-interaction networks for the LC and HTP datasets. Given the perceived orthogonality of physical and genetic interaction space based on HTP studies [12], the LC-PI and LC-GI networks exhibited an unexpectedly high degree of overlap, at 12% of all protein interactions and 17% of all genetic interactions (Figure 7a). Of the 1,409 overlap pairs, 442 corresponded to interactions between essential proteins, while an additional 488 corresponded to interactions between an essential and a nonessential protein. The essential gene or protein content of the overlapping set of interactions was not substantially different from the input LC-PI and LC-GI datasets, nor was there pronounced enrichment or depletion for synthetic lethality or any other type of genetic interaction in the overlap dataset (see Additional data file 1). In striking contrast, overlap between the HTP-PI and HTP-GI networks was virtually nonexistent (Figure 7b), as has been previously noted [12]. This minimal overlap was due to the properties of the HTP-GI network, as the HTP-GI overlap with LC-PI was also minimal (Figure 7c), whereas the overlap between HTP-PI and LC-GI was significant (Figure 7d). Because essential genes were not enriched in the LC-PI/LC-GI overlap set, the under-representation of essential genes in the HTP-GI network [10,12,13] cannot explain the minimal overlap of HTP-GI with the LC-PI and HTP-PI networks (Figure 7b,c). It has been noted that proteins that exhibit more physical interactions tend also to exhibit more genetic interactions [66]. Indeed, the average number of physical connections for the nodes in the LC-PI/LC-GI overlap set was 7.7, compared with 3.2 for the remainder of the nodes in LC-PI. This feature does not, however, explain the discrepancy between the LC-GI and HTP-GI datasets because both had very similar physical connectivity distributions. Interestingly, half (706 of 1,409) of the interactions that do overlap in the LC-PI and LC-GI datasets mapped back to the same publication as each other, suggesting that investigators may often test specific interactions in order to support initial observations. This bias may help drive overlap between the LC-PI and LC-GI datasets.

Bottom Line: Sparse coverage in HTP datasets may, however, distort network properties and confound predictions.We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications.We show that the LC dataset considerably improves the predictive power of network-analysis approaches.

View Article: PubMed Central - HTML - PubMed

Affiliation: Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto ON M5G 1X5, Canada. teresa.reguly@utoronto.ca

ABSTRACT

Background: The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference.

Results: We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID (http://www.thebiogrid.org) and SGD (http://www.yeastgenome.org/) databases.

Conclusion: Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks.

Show MeSH