Limits...
Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae.

Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hon GC, Myers CL, Parsons A, Friesen H, Oughtred R, Tong A, Stark C, Ho Y, Botstein D, Andrews B, Boone C, Troyanskya OG, Ideker T, Dolinski K, Batada NN, Tyers M - J. Biol. (2006)

Bottom Line: Sparse coverage in HTP datasets may, however, distort network properties and confound predictions.We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications.We show that the LC dataset considerably improves the predictive power of network-analysis approaches.

View Article: PubMed Central - HTML - PubMed

Affiliation: Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto ON M5G 1X5, Canada. teresa.reguly@utoronto.ca

ABSTRACT

Background: The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference.

Results: We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID (http://www.thebiogrid.org) and SGD (http://www.yeastgenome.org/) databases.

Conclusion: Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks.

Show MeSH
Interactions from the LC dataset dominate the composition of predicted protein complexes. (a) Contribution of HTP-PI and LC-PI data to predicted protein complexes. Each of the 420 predicted complexes are binned according to the percentage of LC (blue) or HTP (red) interactions it contains. The two distributions are not exact complements because some interactions are members of both LC-PI and HTP-PI. (b) The overlap of predicted protein complexes with actual protein complexes as defined by co-purification. For a predicted complex and a gold-standard complex, a hit is scored when the two sets of proteins produce a Jaccard similarity of ≥ 0.13. Top panel, green bars indicate the percentage of gold-standard complexes hit by some predicted complex. The sum of the green and yellow bars is the percentage of predicted complexes hit by some gold-standard complex. Bottom panel, the percentage of proteins in gold-standard complexes represented in all predicted complexes. This gives a rough upper bound on the percentage of gold-standard complexes that can be hit. (c) Complexes conserved between yeast and Drosophila are enriched in LC-PI interactions. This histogram is analogous to that shown for yeast-only complexes in Figure 10a. (d) Example of orthology between yeast and fly protein complexes in a cytoskeletal control network. The high degree of LC-PI interconnections between yeast proteins (orange) validates fly HTP interactions (blue) and suggests new potential connections to test between fly proteins. Thick lines indicate direct interactions, thin lines indicate interactions bridged by a common neighbor. Complex layouts were rendered in Cytoscape [97]. (e) Prediction of GO process annotations using conserved versus yeast-only complexes. Green bars indicate the number of correct predictions and yellow bars indicate the number of incorrect predictions, the sum of which is the total number of predictions. Complex and pathway prediction was carried out according to [31] and results were averaged over five rounds of full tenfold cross-validation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1561585&req=5

Figure 10: Interactions from the LC dataset dominate the composition of predicted protein complexes. (a) Contribution of HTP-PI and LC-PI data to predicted protein complexes. Each of the 420 predicted complexes are binned according to the percentage of LC (blue) or HTP (red) interactions it contains. The two distributions are not exact complements because some interactions are members of both LC-PI and HTP-PI. (b) The overlap of predicted protein complexes with actual protein complexes as defined by co-purification. For a predicted complex and a gold-standard complex, a hit is scored when the two sets of proteins produce a Jaccard similarity of ≥ 0.13. Top panel, green bars indicate the percentage of gold-standard complexes hit by some predicted complex. The sum of the green and yellow bars is the percentage of predicted complexes hit by some gold-standard complex. Bottom panel, the percentage of proteins in gold-standard complexes represented in all predicted complexes. This gives a rough upper bound on the percentage of gold-standard complexes that can be hit. (c) Complexes conserved between yeast and Drosophila are enriched in LC-PI interactions. This histogram is analogous to that shown for yeast-only complexes in Figure 10a. (d) Example of orthology between yeast and fly protein complexes in a cytoskeletal control network. The high degree of LC-PI interconnections between yeast proteins (orange) validates fly HTP interactions (blue) and suggests new potential connections to test between fly proteins. Thick lines indicate direct interactions, thin lines indicate interactions bridged by a common neighbor. Complex layouts were rendered in Cytoscape [97]. (e) Prediction of GO process annotations using conserved versus yeast-only complexes. Green bars indicate the number of correct predictions and yellow bars indicate the number of incorrect predictions, the sum of which is the total number of predictions. Complex and pathway prediction was carried out according to [31] and results were averaged over five rounds of full tenfold cross-validation.

Mentions: A variety of computational approaches have been devised to infer protein complexes from partial interaction datasets [31,73-75]. We used the PathBLAST network alignment tool to identify prospective protein complexes in the combined LC-PI and HTP-PI networks as subnetworks of interactions that were significantly more densely connected than would be expected in randomized versions of the same network [31]. This method predicted a total of 539 yeast protein complexes in addition to (and excluding) the 258 definitive biochemically purified complexes already present in the LC-PI dataset (see Additional data file 1). The relative contributions of LC-PI versus HTP-PI data to the predicted complexes were assessed by counting interactions donated from each dataset (Figure 10a). As shown, the LC-PI dataset contributed the majority of interactions that formed the predicted complexes; thus, LC interactions show a greater tendency to cluster into complex-like structures. As another measure of enrichment for complexes in the LC-PI dataset, we assessed the overlap between the complexes predicted from local interaction density versus the 258 biochemically purified gold-standard complexes, again as a function of contributions from the LC versus HTP datasets (Figure 10b). Here again, the LC-PI dataset outperformed the HTP-PI dataset. The minimal overlap of locally dense regions in the LC-PI and HTP-PI datasets was also evident visually in two-dimensional hierarchical clustering maps of the combined datasets (see Additional data file 3).


Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae.

Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hon GC, Myers CL, Parsons A, Friesen H, Oughtred R, Tong A, Stark C, Ho Y, Botstein D, Andrews B, Boone C, Troyanskya OG, Ideker T, Dolinski K, Batada NN, Tyers M - J. Biol. (2006)

Interactions from the LC dataset dominate the composition of predicted protein complexes. (a) Contribution of HTP-PI and LC-PI data to predicted protein complexes. Each of the 420 predicted complexes are binned according to the percentage of LC (blue) or HTP (red) interactions it contains. The two distributions are not exact complements because some interactions are members of both LC-PI and HTP-PI. (b) The overlap of predicted protein complexes with actual protein complexes as defined by co-purification. For a predicted complex and a gold-standard complex, a hit is scored when the two sets of proteins produce a Jaccard similarity of ≥ 0.13. Top panel, green bars indicate the percentage of gold-standard complexes hit by some predicted complex. The sum of the green and yellow bars is the percentage of predicted complexes hit by some gold-standard complex. Bottom panel, the percentage of proteins in gold-standard complexes represented in all predicted complexes. This gives a rough upper bound on the percentage of gold-standard complexes that can be hit. (c) Complexes conserved between yeast and Drosophila are enriched in LC-PI interactions. This histogram is analogous to that shown for yeast-only complexes in Figure 10a. (d) Example of orthology between yeast and fly protein complexes in a cytoskeletal control network. The high degree of LC-PI interconnections between yeast proteins (orange) validates fly HTP interactions (blue) and suggests new potential connections to test between fly proteins. Thick lines indicate direct interactions, thin lines indicate interactions bridged by a common neighbor. Complex layouts were rendered in Cytoscape [97]. (e) Prediction of GO process annotations using conserved versus yeast-only complexes. Green bars indicate the number of correct predictions and yellow bars indicate the number of incorrect predictions, the sum of which is the total number of predictions. Complex and pathway prediction was carried out according to [31] and results were averaged over five rounds of full tenfold cross-validation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1561585&req=5

Figure 10: Interactions from the LC dataset dominate the composition of predicted protein complexes. (a) Contribution of HTP-PI and LC-PI data to predicted protein complexes. Each of the 420 predicted complexes are binned according to the percentage of LC (blue) or HTP (red) interactions it contains. The two distributions are not exact complements because some interactions are members of both LC-PI and HTP-PI. (b) The overlap of predicted protein complexes with actual protein complexes as defined by co-purification. For a predicted complex and a gold-standard complex, a hit is scored when the two sets of proteins produce a Jaccard similarity of ≥ 0.13. Top panel, green bars indicate the percentage of gold-standard complexes hit by some predicted complex. The sum of the green and yellow bars is the percentage of predicted complexes hit by some gold-standard complex. Bottom panel, the percentage of proteins in gold-standard complexes represented in all predicted complexes. This gives a rough upper bound on the percentage of gold-standard complexes that can be hit. (c) Complexes conserved between yeast and Drosophila are enriched in LC-PI interactions. This histogram is analogous to that shown for yeast-only complexes in Figure 10a. (d) Example of orthology between yeast and fly protein complexes in a cytoskeletal control network. The high degree of LC-PI interconnections between yeast proteins (orange) validates fly HTP interactions (blue) and suggests new potential connections to test between fly proteins. Thick lines indicate direct interactions, thin lines indicate interactions bridged by a common neighbor. Complex layouts were rendered in Cytoscape [97]. (e) Prediction of GO process annotations using conserved versus yeast-only complexes. Green bars indicate the number of correct predictions and yellow bars indicate the number of incorrect predictions, the sum of which is the total number of predictions. Complex and pathway prediction was carried out according to [31] and results were averaged over five rounds of full tenfold cross-validation.
Mentions: A variety of computational approaches have been devised to infer protein complexes from partial interaction datasets [31,73-75]. We used the PathBLAST network alignment tool to identify prospective protein complexes in the combined LC-PI and HTP-PI networks as subnetworks of interactions that were significantly more densely connected than would be expected in randomized versions of the same network [31]. This method predicted a total of 539 yeast protein complexes in addition to (and excluding) the 258 definitive biochemically purified complexes already present in the LC-PI dataset (see Additional data file 1). The relative contributions of LC-PI versus HTP-PI data to the predicted complexes were assessed by counting interactions donated from each dataset (Figure 10a). As shown, the LC-PI dataset contributed the majority of interactions that formed the predicted complexes; thus, LC interactions show a greater tendency to cluster into complex-like structures. As another measure of enrichment for complexes in the LC-PI dataset, we assessed the overlap between the complexes predicted from local interaction density versus the 258 biochemically purified gold-standard complexes, again as a function of contributions from the LC versus HTP datasets (Figure 10b). Here again, the LC-PI dataset outperformed the HTP-PI dataset. The minimal overlap of locally dense regions in the LC-PI and HTP-PI datasets was also evident visually in two-dimensional hierarchical clustering maps of the combined datasets (see Additional data file 3).

Bottom Line: Sparse coverage in HTP datasets may, however, distort network properties and confound predictions.We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications.We show that the LC dataset considerably improves the predictive power of network-analysis approaches.

View Article: PubMed Central - HTML - PubMed

Affiliation: Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto ON M5G 1X5, Canada. teresa.reguly@utoronto.ca

ABSTRACT

Background: The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference.

Results: We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID (http://www.thebiogrid.org) and SGD (http://www.yeastgenome.org/) databases.

Conclusion: Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks.

Show MeSH