Limits...
A complex-based reconstruction of the Saccharomyces cerevisiae interactome.

Wang H, Kakaradov B, Collins SR, Karotki L, Fiedler D, Shales M, Shokat KM, Walther TC, Krogan NJ, Koller D - Mol. Cell Proteomics (2009)

Bottom Line: This study makes two contributions toward this goal.We demonstrate that our approach constructs over 40% more known complexes than other recent methods and that the complexes it produces are more biologically coherent even compared with the reference set.We show that our complex level network, which we call ComplexNet, provides novel insights regarding the protein-protein interaction network.

View Article: PubMed Central - PubMed

Affiliation: Computer Science Department, Stanford University, Stanford, California 94305, USA.

ABSTRACT
Most cellular processes are performed by proteomic units that interact with each other. These units are often stoichiometrically stable complexes comprised of several proteins. To obtain a faithful view of the protein interactome we must view it in terms of these basic units (complexes and proteins) and the interactions between them. This study makes two contributions toward this goal. First, it provides a new algorithm for reconstruction of stable complexes from a variety of heterogeneous biological assays; our approach combines state-of-the-art machine learning methods with a novel hierarchical clustering algorithm that allows clusters to overlap. We demonstrate that our approach constructs over 40% more known complexes than other recent methods and that the complexes it produces are more biologically coherent even compared with the reference set. We provide experimental support for some of our novel predictions, identifying both a new complex involved in nutrient starvation and a new component of the eisosome complex. Second, we provide a high accuracy algorithm for the novel problem of predicting transient interactions involving complexes. We show that our complex level network, which we call ComplexNet, provides novel insights regarding the protein-protein interaction network. In particular, we reinterpret the finding that "hubs" in the network are enriched for being essential, showing instead that essential proteins tend to be clustered together in essential complexes and that these essential complexes tend to be large.

Show MeSH

Related in: MedlinePlus

Relationship between complex size and essentiality. a, fraction of complexes with different essentiality fractions. Each complex is represented by its size and the fraction of essential components. The different colors represent different ratios of essentiality in a complex discretized into five bins. The x axis represents the complex size, and the y axis represents the fraction of complexes of that size that have this particular essentiality ratio. We can see that the large majority of complexes of size 2 have essentiality ratio in the range 0–0.2, whereas larger complexes tend to have a larger essentiality ratio. Also shown on the x axis, in parentheses, is the number of complexes in each category (e.g. there are 54 complexes of size 3). b, the relationship between complex size and the proportion of essential proteins in complexes of that size. The x axis is the size bin of the complexes. The y axis is the proportion of essential proteins in all complexes within the size bin. As we can see, larger complexes tend to have a higher proportion of essential proteins. c, evaluation of different metrics as predictive of essentiality: size of the largest enclosing complex versus degree in the protein-protein interaction network (hubness). For the red and light blue curves, we rank each protein based on the size of the largest complex to which is belongs; the red curve uses predicted complexes, and the light blue curve uses the reference complexes. For the blue curve and green curve, we use the hubness, the degree of protein in a protein-protein interaction network; the blue curve uses the yeast two-hybrid protein-protein interaction network, and the green curve uses a network where pairs are connected if they have a scaled PE score >0.5. The x axis is the number of essential proteins in the K top ranked proteins (for different values of K), and the y axis is the number of non-essential proteins. Complex size in our predicted complexes (red) is the best predictor for essentiality. The hubness based on PE score (green) performs better than the other metrics presumably because it also correlates directly with co-membership in a complex. The reference complexes (light blue) perform slightly worse but considerably better than interactions in the Y2H data.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2690481&req=5

f7: Relationship between complex size and essentiality. a, fraction of complexes with different essentiality fractions. Each complex is represented by its size and the fraction of essential components. The different colors represent different ratios of essentiality in a complex discretized into five bins. The x axis represents the complex size, and the y axis represents the fraction of complexes of that size that have this particular essentiality ratio. We can see that the large majority of complexes of size 2 have essentiality ratio in the range 0–0.2, whereas larger complexes tend to have a larger essentiality ratio. Also shown on the x axis, in parentheses, is the number of complexes in each category (e.g. there are 54 complexes of size 3). b, the relationship between complex size and the proportion of essential proteins in complexes of that size. The x axis is the size bin of the complexes. The y axis is the proportion of essential proteins in all complexes within the size bin. As we can see, larger complexes tend to have a higher proportion of essential proteins. c, evaluation of different metrics as predictive of essentiality: size of the largest enclosing complex versus degree in the protein-protein interaction network (hubness). For the red and light blue curves, we rank each protein based on the size of the largest complex to which is belongs; the red curve uses predicted complexes, and the light blue curve uses the reference complexes. For the blue curve and green curve, we use the hubness, the degree of protein in a protein-protein interaction network; the blue curve uses the yeast two-hybrid protein-protein interaction network, and the green curve uses a network where pairs are connected if they have a scaled PE score >0.5. The x axis is the number of essential proteins in the K top ranked proteins (for different values of K), and the y axis is the number of non-essential proteins. Complex size in our predicted complexes (red) is the best predictor for essentiality. The hubness based on PE score (green) performs better than the other metrics presumably because it also correlates directly with co-membership in a complex. The reference complexes (light blue) perform slightly worse but considerably better than interactions in the Y2H data.

Mentions: Much discussion has occurred regarding the relationship between essentiality and the structure of the protein-protein interaction network. Early work of Jeong et al. (26) and Han et al. (84) found that hub proteins in a protein-protein interaction network are more likely to be encoded by essential genes. More recent work (85) suggests that highly connected proteins are simply more likely to participate in essential protein-protein interactions and are therefore more likely to be essential. However, a deeper insight on the relationship between the protein network and essentiality can be obtained by considering the network at the level of complexes rather than pairwise interactions. Such an analysis was recently performed by Hart et al. (5), who showed that essential proteins are concentrated in certain complexes, resulting in a dichotomy of essential and non-essential complexes. This phenomenon was also found in our predicted complexes (Fig. 7a). However, that finding does not explain why hubs in the network are more likely to be essential. We therefore looked into the distribution of essential proteins in complexes of different sizes and found that the fraction of essential components in a complex tends to increase with complex size (Fig. 7b). Moreover when we aggregate over all complexes of a given size, larger complexes tend to have a far greater proportion of essential proteins among their components (Fig. 7b). Components in a large complex are naturally highly connected in the protein interaction network and therefore often form hubs. Thus, the finding regarding the essentiality of hubs very likely arises from the fact that large complexes are more likely to have a much higher ratio of essential genes. Our finding is consistent with the recent work of Zotenko et al. (86), who argue that essential hubs are often members of a densely connected set of proteins performing an essential cellular function. However, this analysis is still performed on the pairwise protein network and hence is unable to identify the strong dependence between the size of a complex and its essentiality.


A complex-based reconstruction of the Saccharomyces cerevisiae interactome.

Wang H, Kakaradov B, Collins SR, Karotki L, Fiedler D, Shales M, Shokat KM, Walther TC, Krogan NJ, Koller D - Mol. Cell Proteomics (2009)

Relationship between complex size and essentiality. a, fraction of complexes with different essentiality fractions. Each complex is represented by its size and the fraction of essential components. The different colors represent different ratios of essentiality in a complex discretized into five bins. The x axis represents the complex size, and the y axis represents the fraction of complexes of that size that have this particular essentiality ratio. We can see that the large majority of complexes of size 2 have essentiality ratio in the range 0–0.2, whereas larger complexes tend to have a larger essentiality ratio. Also shown on the x axis, in parentheses, is the number of complexes in each category (e.g. there are 54 complexes of size 3). b, the relationship between complex size and the proportion of essential proteins in complexes of that size. The x axis is the size bin of the complexes. The y axis is the proportion of essential proteins in all complexes within the size bin. As we can see, larger complexes tend to have a higher proportion of essential proteins. c, evaluation of different metrics as predictive of essentiality: size of the largest enclosing complex versus degree in the protein-protein interaction network (hubness). For the red and light blue curves, we rank each protein based on the size of the largest complex to which is belongs; the red curve uses predicted complexes, and the light blue curve uses the reference complexes. For the blue curve and green curve, we use the hubness, the degree of protein in a protein-protein interaction network; the blue curve uses the yeast two-hybrid protein-protein interaction network, and the green curve uses a network where pairs are connected if they have a scaled PE score >0.5. The x axis is the number of essential proteins in the K top ranked proteins (for different values of K), and the y axis is the number of non-essential proteins. Complex size in our predicted complexes (red) is the best predictor for essentiality. The hubness based on PE score (green) performs better than the other metrics presumably because it also correlates directly with co-membership in a complex. The reference complexes (light blue) perform slightly worse but considerably better than interactions in the Y2H data.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2690481&req=5

f7: Relationship between complex size and essentiality. a, fraction of complexes with different essentiality fractions. Each complex is represented by its size and the fraction of essential components. The different colors represent different ratios of essentiality in a complex discretized into five bins. The x axis represents the complex size, and the y axis represents the fraction of complexes of that size that have this particular essentiality ratio. We can see that the large majority of complexes of size 2 have essentiality ratio in the range 0–0.2, whereas larger complexes tend to have a larger essentiality ratio. Also shown on the x axis, in parentheses, is the number of complexes in each category (e.g. there are 54 complexes of size 3). b, the relationship between complex size and the proportion of essential proteins in complexes of that size. The x axis is the size bin of the complexes. The y axis is the proportion of essential proteins in all complexes within the size bin. As we can see, larger complexes tend to have a higher proportion of essential proteins. c, evaluation of different metrics as predictive of essentiality: size of the largest enclosing complex versus degree in the protein-protein interaction network (hubness). For the red and light blue curves, we rank each protein based on the size of the largest complex to which is belongs; the red curve uses predicted complexes, and the light blue curve uses the reference complexes. For the blue curve and green curve, we use the hubness, the degree of protein in a protein-protein interaction network; the blue curve uses the yeast two-hybrid protein-protein interaction network, and the green curve uses a network where pairs are connected if they have a scaled PE score >0.5. The x axis is the number of essential proteins in the K top ranked proteins (for different values of K), and the y axis is the number of non-essential proteins. Complex size in our predicted complexes (red) is the best predictor for essentiality. The hubness based on PE score (green) performs better than the other metrics presumably because it also correlates directly with co-membership in a complex. The reference complexes (light blue) perform slightly worse but considerably better than interactions in the Y2H data.
Mentions: Much discussion has occurred regarding the relationship between essentiality and the structure of the protein-protein interaction network. Early work of Jeong et al. (26) and Han et al. (84) found that hub proteins in a protein-protein interaction network are more likely to be encoded by essential genes. More recent work (85) suggests that highly connected proteins are simply more likely to participate in essential protein-protein interactions and are therefore more likely to be essential. However, a deeper insight on the relationship between the protein network and essentiality can be obtained by considering the network at the level of complexes rather than pairwise interactions. Such an analysis was recently performed by Hart et al. (5), who showed that essential proteins are concentrated in certain complexes, resulting in a dichotomy of essential and non-essential complexes. This phenomenon was also found in our predicted complexes (Fig. 7a). However, that finding does not explain why hubs in the network are more likely to be essential. We therefore looked into the distribution of essential proteins in complexes of different sizes and found that the fraction of essential components in a complex tends to increase with complex size (Fig. 7b). Moreover when we aggregate over all complexes of a given size, larger complexes tend to have a far greater proportion of essential proteins among their components (Fig. 7b). Components in a large complex are naturally highly connected in the protein interaction network and therefore often form hubs. Thus, the finding regarding the essentiality of hubs very likely arises from the fact that large complexes are more likely to have a much higher ratio of essential genes. Our finding is consistent with the recent work of Zotenko et al. (86), who argue that essential hubs are often members of a densely connected set of proteins performing an essential cellular function. However, this analysis is still performed on the pairwise protein network and hence is unable to identify the strong dependence between the size of a complex and its essentiality.

Bottom Line: This study makes two contributions toward this goal.We demonstrate that our approach constructs over 40% more known complexes than other recent methods and that the complexes it produces are more biologically coherent even compared with the reference set.We show that our complex level network, which we call ComplexNet, provides novel insights regarding the protein-protein interaction network.

View Article: PubMed Central - PubMed

Affiliation: Computer Science Department, Stanford University, Stanford, California 94305, USA.

ABSTRACT
Most cellular processes are performed by proteomic units that interact with each other. These units are often stoichiometrically stable complexes comprised of several proteins. To obtain a faithful view of the protein interactome we must view it in terms of these basic units (complexes and proteins) and the interactions between them. This study makes two contributions toward this goal. First, it provides a new algorithm for reconstruction of stable complexes from a variety of heterogeneous biological assays; our approach combines state-of-the-art machine learning methods with a novel hierarchical clustering algorithm that allows clusters to overlap. We demonstrate that our approach constructs over 40% more known complexes than other recent methods and that the complexes it produces are more biologically coherent even compared with the reference set. We provide experimental support for some of our novel predictions, identifying both a new complex involved in nutrient starvation and a new component of the eisosome complex. Second, we provide a high accuracy algorithm for the novel problem of predicting transient interactions involving complexes. We show that our complex level network, which we call ComplexNet, provides novel insights regarding the protein-protein interaction network. In particular, we reinterpret the finding that "hubs" in the network are enriched for being essential, showing instead that essential proteins tend to be clustered together in essential complexes and that these essential complexes tend to be large.

Show MeSH
Related in: MedlinePlus