Limits...
Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae.

Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hon GC, Myers CL, Parsons A, Friesen H, Oughtred R, Tong A, Stark C, Ho Y, Botstein D, Andrews B, Boone C, Troyanskya OG, Ideker T, Dolinski K, Batada NN, Tyers M - J. Biol. (2006)

Bottom Line: Sparse coverage in HTP datasets may, however, distort network properties and confound predictions.We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications.We show that the LC dataset considerably improves the predictive power of network-analysis approaches.

View Article: PubMed Central - HTML - PubMed

Affiliation: Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto ON M5G 1X5, Canada. teresa.reguly@utoronto.ca

ABSTRACT

Background: The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference.

Results: We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID (http://www.thebiogrid.org) and SGD (http://www.yeastgenome.org/) databases.

Conclusion: Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks.

Show MeSH

Related in: MedlinePlus

Scale-free degree distribution of physical and genetic interaction networks. (a) Frequency-degree plots of LC, HTP and combined networks. Degree is the connectivity (k) for each node, and frequency indicates the probability of finding a node with a given degree. The linear fit for each plot approximates a power-law distribution. (b) Rank-degree plots of LC, HTP, and combined networks. Each data point actually represents many nodes that have the same degree. The fit of the data to either linear (lin) or exponential (exp) curves is indicated for each plot and the coefficient of determination (R2) is reported in parentheses for each curve fit. Note that although the tail of each distribution exhibits a large deviation, only a small portion of the network is represented by the highly connected nodes in the tail region. For example, approximately 2% of nodes in the LC-PI and HTP-PI networks have connectivity greater than 30.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1561585&req=5

Figure 5: Scale-free degree distribution of physical and genetic interaction networks. (a) Frequency-degree plots of LC, HTP and combined networks. Degree is the connectivity (k) for each node, and frequency indicates the probability of finding a node with a given degree. The linear fit for each plot approximates a power-law distribution. (b) Rank-degree plots of LC, HTP, and combined networks. Each data point actually represents many nodes that have the same degree. The fit of the data to either linear (lin) or exponential (exp) curves is indicated for each plot and the coefficient of determination (R2) is reported in parentheses for each curve fit. Note that although the tail of each distribution exhibits a large deviation, only a small portion of the network is represented by the highly connected nodes in the tail region. For example, approximately 2% of nodes in the LC-PI and HTP-PI networks have connectivity greater than 30.

Mentions: In a scale-free network, some nodes are highly connected whereas most nodes have few connections. Such networks follow an apparent power-law distribution that may arise as a consequence of preferential attachment of new nodes to well connected hubs, which are critical for the stability of the overall network [18,19,21-23]. Connectivity influences the way a network operates, including how it responds to catastrophic events, such as ablation of gene or protein function. Previous analysis of the yeast HTP protein-interaction dataset suggested that the overall network behaves in a scale-free manner [22,23]. Both the LC-PI and the HTP-PI datasets essentially followed a scale-free degree distribution, either alone or in combination (Figure 5a). We note, however, that the frequency-degree log plots did not yield a perfectly linear fit for the LC network, which showed a higher-than-expected concentration of nodes with connectivity of 10–12. If analysis of the LC network was restricted to nodes with connectivity less than 20 (which represent more than 95% of the data), then the log-linear fit was much better. Similarly, both the LC-GI and HTP-GI genetic networks, either alone or in combination, followed an apparent power-law distribution (Figure 5a), as shown previously for a HTP-GI network [12].


Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae.

Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hon GC, Myers CL, Parsons A, Friesen H, Oughtred R, Tong A, Stark C, Ho Y, Botstein D, Andrews B, Boone C, Troyanskya OG, Ideker T, Dolinski K, Batada NN, Tyers M - J. Biol. (2006)

Scale-free degree distribution of physical and genetic interaction networks. (a) Frequency-degree plots of LC, HTP and combined networks. Degree is the connectivity (k) for each node, and frequency indicates the probability of finding a node with a given degree. The linear fit for each plot approximates a power-law distribution. (b) Rank-degree plots of LC, HTP, and combined networks. Each data point actually represents many nodes that have the same degree. The fit of the data to either linear (lin) or exponential (exp) curves is indicated for each plot and the coefficient of determination (R2) is reported in parentheses for each curve fit. Note that although the tail of each distribution exhibits a large deviation, only a small portion of the network is represented by the highly connected nodes in the tail region. For example, approximately 2% of nodes in the LC-PI and HTP-PI networks have connectivity greater than 30.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1561585&req=5

Figure 5: Scale-free degree distribution of physical and genetic interaction networks. (a) Frequency-degree plots of LC, HTP and combined networks. Degree is the connectivity (k) for each node, and frequency indicates the probability of finding a node with a given degree. The linear fit for each plot approximates a power-law distribution. (b) Rank-degree plots of LC, HTP, and combined networks. Each data point actually represents many nodes that have the same degree. The fit of the data to either linear (lin) or exponential (exp) curves is indicated for each plot and the coefficient of determination (R2) is reported in parentheses for each curve fit. Note that although the tail of each distribution exhibits a large deviation, only a small portion of the network is represented by the highly connected nodes in the tail region. For example, approximately 2% of nodes in the LC-PI and HTP-PI networks have connectivity greater than 30.
Mentions: In a scale-free network, some nodes are highly connected whereas most nodes have few connections. Such networks follow an apparent power-law distribution that may arise as a consequence of preferential attachment of new nodes to well connected hubs, which are critical for the stability of the overall network [18,19,21-23]. Connectivity influences the way a network operates, including how it responds to catastrophic events, such as ablation of gene or protein function. Previous analysis of the yeast HTP protein-interaction dataset suggested that the overall network behaves in a scale-free manner [22,23]. Both the LC-PI and the HTP-PI datasets essentially followed a scale-free degree distribution, either alone or in combination (Figure 5a). We note, however, that the frequency-degree log plots did not yield a perfectly linear fit for the LC network, which showed a higher-than-expected concentration of nodes with connectivity of 10–12. If analysis of the LC network was restricted to nodes with connectivity less than 20 (which represent more than 95% of the data), then the log-linear fit was much better. Similarly, both the LC-GI and HTP-GI genetic networks, either alone or in combination, followed an apparent power-law distribution (Figure 5a), as shown previously for a HTP-GI network [12].

Bottom Line: Sparse coverage in HTP datasets may, however, distort network properties and confound predictions.We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications.We show that the LC dataset considerably improves the predictive power of network-analysis approaches.

View Article: PubMed Central - HTML - PubMed

Affiliation: Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto ON M5G 1X5, Canada. teresa.reguly@utoronto.ca

ABSTRACT

Background: The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference.

Results: We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID (http://www.thebiogrid.org) and SGD (http://www.yeastgenome.org/) databases.

Conclusion: Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks.

Show MeSH
Related in: MedlinePlus