Limits...
A complex-based reconstruction of the Saccharomyces cerevisiae interactome.

Wang H, Kakaradov B, Collins SR, Karotki L, Fiedler D, Shales M, Shokat KM, Walther TC, Krogan NJ, Koller D - Mol. Cell Proteomics (2009)

Bottom Line: This study makes two contributions toward this goal.We demonstrate that our approach constructs over 40% more known complexes than other recent methods and that the complexes it produces are more biologically coherent even compared with the reference set.We show that our complex level network, which we call ComplexNet, provides novel insights regarding the protein-protein interaction network.

View Article: PubMed Central - PubMed

Affiliation: Computer Science Department, Stanford University, Stanford, California 94305, USA.

ABSTRACT
Most cellular processes are performed by proteomic units that interact with each other. These units are often stoichiometrically stable complexes comprised of several proteins. To obtain a faithful view of the protein interactome we must view it in terms of these basic units (complexes and proteins) and the interactions between them. This study makes two contributions toward this goal. First, it provides a new algorithm for reconstruction of stable complexes from a variety of heterogeneous biological assays; our approach combines state-of-the-art machine learning methods with a novel hierarchical clustering algorithm that allows clusters to overlap. We demonstrate that our approach constructs over 40% more known complexes than other recent methods and that the complexes it produces are more biologically coherent even compared with the reference set. We provide experimental support for some of our novel predictions, identifying both a new complex involved in nutrient starvation and a new component of the eisosome complex. Second, we provide a high accuracy algorithm for the novel problem of predicting transient interactions involving complexes. We show that our complex level network, which we call ComplexNet, provides novel insights regarding the protein-protein interaction network. In particular, we reinterpret the finding that "hubs" in the network are enriched for being essential, showing instead that essential proteins tend to be clustered together in essential complexes and that these essential complexes tend to be large.

Show MeSH

Related in: MedlinePlus

Coherence of our predicted complexes. We computed the functional coherence between proteins in the same complex against external data sources that are not used in training. More coherent complexes have a smaller difference in protein abundance, have a smaller semantic distance in GO biological process, share more transcriptional regulators, and have a higher growth fitness correlation. The y axis shows the values for these metrics of functional coherence; also shown is the performance of random pairs (thick horizontal line). Our predicted set of complexes significantly outperforms other state-of-the-art methods. For GO biological processes, our complexes have a semantic distance 8 and 17% lower than the methods of Hart et al. (5) and Pu et al. (6), respectively. For protein abundance, the improvement over Hart et al. (5) and Pu et al. (6) is 5 and 10%, respectively; conversely our complexes are 12% less coherent than the top affinity pairs, suggesting that proteins with lower affinity scores can be members of the complex but also play other roles in the cell, reducing their correlation with other proteins in the same complex. For the correlation of growth phenotypes across different conditions, our predicted complexes are 19 and 31% more coherent, respectively, a very significant improvement. Finally protein pairs within our complexes on average share 30 and 59%, respectively, more transcription factors than those of Hart et al. (5) and Pu et al. (6). The comparison with the reference complexes shows that our complexes are considerably more coherent on regulator overlap and perform similarly on correlation of abundance and growth phenotype. Conversely our complexes are 21% less coherent than the reference complexes on GO biological process annotations; this is not surprising as the reference complexes and GO annotations are derived (at least in part) from similar data sources, such as literature and small scale experiments.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2690481&req=5

f3: Coherence of our predicted complexes. We computed the functional coherence between proteins in the same complex against external data sources that are not used in training. More coherent complexes have a smaller difference in protein abundance, have a smaller semantic distance in GO biological process, share more transcriptional regulators, and have a higher growth fitness correlation. The y axis shows the values for these metrics of functional coherence; also shown is the performance of random pairs (thick horizontal line). Our predicted set of complexes significantly outperforms other state-of-the-art methods. For GO biological processes, our complexes have a semantic distance 8 and 17% lower than the methods of Hart et al. (5) and Pu et al. (6), respectively. For protein abundance, the improvement over Hart et al. (5) and Pu et al. (6) is 5 and 10%, respectively; conversely our complexes are 12% less coherent than the top affinity pairs, suggesting that proteins with lower affinity scores can be members of the complex but also play other roles in the cell, reducing their correlation with other proteins in the same complex. For the correlation of growth phenotypes across different conditions, our predicted complexes are 19 and 31% more coherent, respectively, a very significant improvement. Finally protein pairs within our complexes on average share 30 and 59%, respectively, more transcription factors than those of Hart et al. (5) and Pu et al. (6). The comparison with the reference complexes shows that our complexes are considerably more coherent on regulator overlap and perform similarly on correlation of abundance and growth phenotype. Conversely our complexes are 21% less coherent than the reference complexes on GO biological process annotations; this is not surprising as the reference complexes and GO annotations are derived (at least in part) from similar data sources, such as literature and small scale experiments.

Mentions: We validate our predictions by looking at various measures of biological coherence (Fig. 3): similarity of GO biological process, similarity in the level of protein abundance for different complex components, correlation of growth defect profiles across a broad range of conditions, and co-regulation as measured by sharing of transcription factors. For all measures, HACO with our affinity function considerably outperformed all other approaches with the method of Hart et al. (5) being the closest competitor. Most striking were the improvements in correlation of growth phenotypes across multiple conditions and in coherence of the transcriptional regulation program. To specifically test our complex formation process, we also compared pairs of co-complexed proteins with pairs that have high affinity (as computed by our boosting algorithm). The results were largely comparable with the notable exception of protein abundance where our complexes are 12% less coherent than the top affinity pairs; this suggests that proteins with lower affinity scores can be members of the complex but also play other roles in the cell, reducing their correlation with other proteins in the same complex. The comparison with the reference complexes is also interesting. Our complexes are considerably more coherent than the reference complexes on regulator overlap and perform similarly on correlation of abundance and growth phenotype. Conversely our complexes are significantly less coherent than the reference complexes on GO biological process annotations; this is not surprising as the reference complexes and GO annotations are derived (at least in part) from similar data sources, such as literature and small scale experiments. Overall when comparing with data sources that were not used in constructing the reference complexes, our predictions seem to perform as well or better than the reference set, suggesting that our predictions provide a strong set of complexes that can be used as a new reference.


A complex-based reconstruction of the Saccharomyces cerevisiae interactome.

Wang H, Kakaradov B, Collins SR, Karotki L, Fiedler D, Shales M, Shokat KM, Walther TC, Krogan NJ, Koller D - Mol. Cell Proteomics (2009)

Coherence of our predicted complexes. We computed the functional coherence between proteins in the same complex against external data sources that are not used in training. More coherent complexes have a smaller difference in protein abundance, have a smaller semantic distance in GO biological process, share more transcriptional regulators, and have a higher growth fitness correlation. The y axis shows the values for these metrics of functional coherence; also shown is the performance of random pairs (thick horizontal line). Our predicted set of complexes significantly outperforms other state-of-the-art methods. For GO biological processes, our complexes have a semantic distance 8 and 17% lower than the methods of Hart et al. (5) and Pu et al. (6), respectively. For protein abundance, the improvement over Hart et al. (5) and Pu et al. (6) is 5 and 10%, respectively; conversely our complexes are 12% less coherent than the top affinity pairs, suggesting that proteins with lower affinity scores can be members of the complex but also play other roles in the cell, reducing their correlation with other proteins in the same complex. For the correlation of growth phenotypes across different conditions, our predicted complexes are 19 and 31% more coherent, respectively, a very significant improvement. Finally protein pairs within our complexes on average share 30 and 59%, respectively, more transcription factors than those of Hart et al. (5) and Pu et al. (6). The comparison with the reference complexes shows that our complexes are considerably more coherent on regulator overlap and perform similarly on correlation of abundance and growth phenotype. Conversely our complexes are 21% less coherent than the reference complexes on GO biological process annotations; this is not surprising as the reference complexes and GO annotations are derived (at least in part) from similar data sources, such as literature and small scale experiments.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2690481&req=5

f3: Coherence of our predicted complexes. We computed the functional coherence between proteins in the same complex against external data sources that are not used in training. More coherent complexes have a smaller difference in protein abundance, have a smaller semantic distance in GO biological process, share more transcriptional regulators, and have a higher growth fitness correlation. The y axis shows the values for these metrics of functional coherence; also shown is the performance of random pairs (thick horizontal line). Our predicted set of complexes significantly outperforms other state-of-the-art methods. For GO biological processes, our complexes have a semantic distance 8 and 17% lower than the methods of Hart et al. (5) and Pu et al. (6), respectively. For protein abundance, the improvement over Hart et al. (5) and Pu et al. (6) is 5 and 10%, respectively; conversely our complexes are 12% less coherent than the top affinity pairs, suggesting that proteins with lower affinity scores can be members of the complex but also play other roles in the cell, reducing their correlation with other proteins in the same complex. For the correlation of growth phenotypes across different conditions, our predicted complexes are 19 and 31% more coherent, respectively, a very significant improvement. Finally protein pairs within our complexes on average share 30 and 59%, respectively, more transcription factors than those of Hart et al. (5) and Pu et al. (6). The comparison with the reference complexes shows that our complexes are considerably more coherent on regulator overlap and perform similarly on correlation of abundance and growth phenotype. Conversely our complexes are 21% less coherent than the reference complexes on GO biological process annotations; this is not surprising as the reference complexes and GO annotations are derived (at least in part) from similar data sources, such as literature and small scale experiments.
Mentions: We validate our predictions by looking at various measures of biological coherence (Fig. 3): similarity of GO biological process, similarity in the level of protein abundance for different complex components, correlation of growth defect profiles across a broad range of conditions, and co-regulation as measured by sharing of transcription factors. For all measures, HACO with our affinity function considerably outperformed all other approaches with the method of Hart et al. (5) being the closest competitor. Most striking were the improvements in correlation of growth phenotypes across multiple conditions and in coherence of the transcriptional regulation program. To specifically test our complex formation process, we also compared pairs of co-complexed proteins with pairs that have high affinity (as computed by our boosting algorithm). The results were largely comparable with the notable exception of protein abundance where our complexes are 12% less coherent than the top affinity pairs; this suggests that proteins with lower affinity scores can be members of the complex but also play other roles in the cell, reducing their correlation with other proteins in the same complex. The comparison with the reference complexes is also interesting. Our complexes are considerably more coherent than the reference complexes on regulator overlap and perform similarly on correlation of abundance and growth phenotype. Conversely our complexes are significantly less coherent than the reference complexes on GO biological process annotations; this is not surprising as the reference complexes and GO annotations are derived (at least in part) from similar data sources, such as literature and small scale experiments. Overall when comparing with data sources that were not used in constructing the reference complexes, our predictions seem to perform as well or better than the reference set, suggesting that our predictions provide a strong set of complexes that can be used as a new reference.

Bottom Line: This study makes two contributions toward this goal.We demonstrate that our approach constructs over 40% more known complexes than other recent methods and that the complexes it produces are more biologically coherent even compared with the reference set.We show that our complex level network, which we call ComplexNet, provides novel insights regarding the protein-protein interaction network.

View Article: PubMed Central - PubMed

Affiliation: Computer Science Department, Stanford University, Stanford, California 94305, USA.

ABSTRACT
Most cellular processes are performed by proteomic units that interact with each other. These units are often stoichiometrically stable complexes comprised of several proteins. To obtain a faithful view of the protein interactome we must view it in terms of these basic units (complexes and proteins) and the interactions between them. This study makes two contributions toward this goal. First, it provides a new algorithm for reconstruction of stable complexes from a variety of heterogeneous biological assays; our approach combines state-of-the-art machine learning methods with a novel hierarchical clustering algorithm that allows clusters to overlap. We demonstrate that our approach constructs over 40% more known complexes than other recent methods and that the complexes it produces are more biologically coherent even compared with the reference set. We provide experimental support for some of our novel predictions, identifying both a new complex involved in nutrient starvation and a new component of the eisosome complex. Second, we provide a high accuracy algorithm for the novel problem of predicting transient interactions involving complexes. We show that our complex level network, which we call ComplexNet, provides novel insights regarding the protein-protein interaction network. In particular, we reinterpret the finding that "hubs" in the network are enriched for being essential, showing instead that essential proteins tend to be clustered together in essential complexes and that these essential complexes tend to be large.

Show MeSH
Related in: MedlinePlus