Limits...
Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae.

Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hon GC, Myers CL, Parsons A, Friesen H, Oughtred R, Tong A, Stark C, Ho Y, Botstein D, Andrews B, Boone C, Troyanskya OG, Ideker T, Dolinski K, Batada NN, Tyers M - J. Biol. (2006)

Bottom Line: Sparse coverage in HTP datasets may, however, distort network properties and confound predictions.We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications.We show that the LC dataset considerably improves the predictive power of network-analysis approaches.

View Article: PubMed Central - HTML - PubMed

Affiliation: Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto ON M5G 1X5, Canada. teresa.reguly@utoronto.ca

ABSTRACT

Background: The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference.

Results: We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID (http://www.thebiogrid.org) and SGD (http://www.yeastgenome.org/) databases.

Conclusion: Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks.

Show MeSH
Connectivity of essential nodes. (a) Essential nodes tend to be more highly connected in the LC-PI and LC-GI networks. k is the measure of connectivity. (b) Essential-essential interactions are significantly enriched in the LC-PI and HTP-PI datasets but to a lesser extent in the LC-GI dataset. NN, nonessential-nonessential pairs; NE, nonessential-essential pairs, EE, essential-essential pairs. (c) The fraction of neighbors that are essential for LC-PI and HTP-PI networks. Only those nodes with connectivity greater than 3 were considered (n = 1,473 for LC-PI and n = 1,627 for HTP-PI). Compared with HTP-PI, a larger fraction of the immediate neighborhood of essential proteins in the LC-PI is composed of essential genes. (d) Clustering coefficient distribution for physical networks (top panel) and genetic networks (bottom panel). Average clustering coefficients and correlation coefficients were respectively: 0.53 and -0.56 for LC-PI, 0.38 and -0.54 for HTP-PI, 0.50 and -0.61 for LC-GI, 0.53 and -0.67 for HTP-GI. All correlations were computed using Spearman rank correlation and were statistically significant at P < 1e-100.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1561585&req=5

Figure 6: Connectivity of essential nodes. (a) Essential nodes tend to be more highly connected in the LC-PI and LC-GI networks. k is the measure of connectivity. (b) Essential-essential interactions are significantly enriched in the LC-PI and HTP-PI datasets but to a lesser extent in the LC-GI dataset. NN, nonessential-nonessential pairs; NE, nonessential-essential pairs, EE, essential-essential pairs. (c) The fraction of neighbors that are essential for LC-PI and HTP-PI networks. Only those nodes with connectivity greater than 3 were considered (n = 1,473 for LC-PI and n = 1,627 for HTP-PI). Compared with HTP-PI, a larger fraction of the immediate neighborhood of essential proteins in the LC-PI is composed of essential genes. (d) Clustering coefficient distribution for physical networks (top panel) and genetic networks (bottom panel). Average clustering coefficients and correlation coefficients were respectively: 0.53 and -0.56 for LC-PI, 0.38 and -0.54 for HTP-PI, 0.50 and -0.61 for LC-GI, 0.53 and -0.67 for HTP-GI. All correlations were computed using Spearman rank correlation and were statistically significant at P < 1e-100.

Mentions: Random removal of nodes in HTP two-hybrid interaction networks does not affect the overall topology of the network, whereas deletion of highly connected nodes tends to break the network into many smaller components [22]. The likelihood that deletion of a given gene is lethal correlates with the number of interaction partners associated with it in the network. Thus, highly connected proteins with a central role in network architecture are three times more likely to be essential than are proteins with only a small number of links to other proteins. The LC-PI dataset exhibited a strong positive correlation between connectivity and essentiality, whereas the LC-GI dataset exhibited a modest positive correlation (r = 0.35, P < 1 × 10-91 and r = 0.11, P < 1 × 10-7, respectively; Figure 6a). Indeed, in the LC-PI dataset, essential proteins had twice as many interactions on average than nonessential proteins (<k> = 11.7 and 5.2, respectively, P < 1 × 10-100, Mann-Whitney U test). This analysis buttresses the inference that highly connected genes are more likely to be essential [19]. Although it has been suggested that the essentiality is caused by connectivity [22], this notion seems unlikely because 44% of the proteins in the LC-PI dataset that were highly connected (k > 10) were nonessentials. We note that the definition of essentiality as narrowly defined by growth under optimal nutrient conditions is open to interpretation. Indeed, if the definition of essentiality is broadened to include inviability under more stressful conditions [2], the correlation with connectivity is substantially weaker, although still statistically significant (N.N.B., unpublished data).


Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae.

Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hon GC, Myers CL, Parsons A, Friesen H, Oughtred R, Tong A, Stark C, Ho Y, Botstein D, Andrews B, Boone C, Troyanskya OG, Ideker T, Dolinski K, Batada NN, Tyers M - J. Biol. (2006)

Connectivity of essential nodes. (a) Essential nodes tend to be more highly connected in the LC-PI and LC-GI networks. k is the measure of connectivity. (b) Essential-essential interactions are significantly enriched in the LC-PI and HTP-PI datasets but to a lesser extent in the LC-GI dataset. NN, nonessential-nonessential pairs; NE, nonessential-essential pairs, EE, essential-essential pairs. (c) The fraction of neighbors that are essential for LC-PI and HTP-PI networks. Only those nodes with connectivity greater than 3 were considered (n = 1,473 for LC-PI and n = 1,627 for HTP-PI). Compared with HTP-PI, a larger fraction of the immediate neighborhood of essential proteins in the LC-PI is composed of essential genes. (d) Clustering coefficient distribution for physical networks (top panel) and genetic networks (bottom panel). Average clustering coefficients and correlation coefficients were respectively: 0.53 and -0.56 for LC-PI, 0.38 and -0.54 for HTP-PI, 0.50 and -0.61 for LC-GI, 0.53 and -0.67 for HTP-GI. All correlations were computed using Spearman rank correlation and were statistically significant at P < 1e-100.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1561585&req=5

Figure 6: Connectivity of essential nodes. (a) Essential nodes tend to be more highly connected in the LC-PI and LC-GI networks. k is the measure of connectivity. (b) Essential-essential interactions are significantly enriched in the LC-PI and HTP-PI datasets but to a lesser extent in the LC-GI dataset. NN, nonessential-nonessential pairs; NE, nonessential-essential pairs, EE, essential-essential pairs. (c) The fraction of neighbors that are essential for LC-PI and HTP-PI networks. Only those nodes with connectivity greater than 3 were considered (n = 1,473 for LC-PI and n = 1,627 for HTP-PI). Compared with HTP-PI, a larger fraction of the immediate neighborhood of essential proteins in the LC-PI is composed of essential genes. (d) Clustering coefficient distribution for physical networks (top panel) and genetic networks (bottom panel). Average clustering coefficients and correlation coefficients were respectively: 0.53 and -0.56 for LC-PI, 0.38 and -0.54 for HTP-PI, 0.50 and -0.61 for LC-GI, 0.53 and -0.67 for HTP-GI. All correlations were computed using Spearman rank correlation and were statistically significant at P < 1e-100.
Mentions: Random removal of nodes in HTP two-hybrid interaction networks does not affect the overall topology of the network, whereas deletion of highly connected nodes tends to break the network into many smaller components [22]. The likelihood that deletion of a given gene is lethal correlates with the number of interaction partners associated with it in the network. Thus, highly connected proteins with a central role in network architecture are three times more likely to be essential than are proteins with only a small number of links to other proteins. The LC-PI dataset exhibited a strong positive correlation between connectivity and essentiality, whereas the LC-GI dataset exhibited a modest positive correlation (r = 0.35, P < 1 × 10-91 and r = 0.11, P < 1 × 10-7, respectively; Figure 6a). Indeed, in the LC-PI dataset, essential proteins had twice as many interactions on average than nonessential proteins (<k> = 11.7 and 5.2, respectively, P < 1 × 10-100, Mann-Whitney U test). This analysis buttresses the inference that highly connected genes are more likely to be essential [19]. Although it has been suggested that the essentiality is caused by connectivity [22], this notion seems unlikely because 44% of the proteins in the LC-PI dataset that were highly connected (k > 10) were nonessentials. We note that the definition of essentiality as narrowly defined by growth under optimal nutrient conditions is open to interpretation. Indeed, if the definition of essentiality is broadened to include inviability under more stressful conditions [2], the correlation with connectivity is substantially weaker, although still statistically significant (N.N.B., unpublished data).

Bottom Line: Sparse coverage in HTP datasets may, however, distort network properties and confound predictions.We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications.We show that the LC dataset considerably improves the predictive power of network-analysis approaches.

View Article: PubMed Central - HTML - PubMed

Affiliation: Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto ON M5G 1X5, Canada. teresa.reguly@utoronto.ca

ABSTRACT

Background: The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference.

Results: We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID (http://www.thebiogrid.org) and SGD (http://www.yeastgenome.org/) databases.

Conclusion: Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks.

Show MeSH