Limits...
An automated method for finding molecular complexes in large protein interaction networks.

Bader GD, Hogue CW - BMC Bioinformatics (2003)

Bottom Line: As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery.Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes.The program is available from ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE.

View Article: PubMed Central - HTML - PubMed

Affiliation: Samuel Lunenfeld Research Institute, Mt, Sinai Hospital, Toronto ON Canada M5G 1X5, Dept, of Biochemistry, University of Toronto, Toronto ON Canada M5S 1A8. gary.bader@utoronto.ca

ABSTRACT

Background: Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery.

Results: This paper describes a novel graph theoretic clustering algorithm, "Molecular Complex Detection" (MCODE), that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes. The method is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. The algorithm has the advantage over other graph clustering methods of having a directed mode that allows fine-tuning of clusters of interest without considering the rest of the network and allows examination of cluster interconnectivity, which is relevant for protein networks. Protein interaction and complex information from the yeast Saccharomyces cerevisiae was used for evaluation.

Conclusion: Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes. The algorithm is not affected by a known high rate of false positives in data from high-throughput interaction techniques. The program is available from ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE.

Show MeSH
MCODE in Directed Mode Figure legend: MCODE was used in directed mode to further study the complex in Figure 10 by using seed vertices from high density regions of the two parts of this complex. A) The result of examining the Lsm complex using MCODE parameters that are too relaxed (haircut = TRUE, fluff = FALSE, VWP = 0.05). B) The final Lsm complex using MCODE parameters of haircut = TRUE, fluff = FALSE and VWP = 0 seeded with Lsm4. C) The final 26S proteasome complex seeded with Rpt1 using MCODE parameters haircut = TRUE, fluff = TRUE and VWP = 0.2. Visible here are two regions of density in this complex corresponding to the 20S proteolytic subunit (left side – mainly Pre proteins) and the 19S regulatory subunit (right side – mainly Rpt and Rpn proteins).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC149346&req=5

Figure 11: MCODE in Directed Mode Figure legend: MCODE was used in directed mode to further study the complex in Figure 10 by using seed vertices from high density regions of the two parts of this complex. A) The result of examining the Lsm complex using MCODE parameters that are too relaxed (haircut = TRUE, fluff = FALSE, VWP = 0.05). B) The final Lsm complex using MCODE parameters of haircut = TRUE, fluff = FALSE and VWP = 0 seeded with Lsm4. C) The final 26S proteasome complex seeded with Rpt1 using MCODE parameters haircut = TRUE, fluff = TRUE and VWP = 0.2. Visible here are two regions of density in this complex corresponding to the 20S proteolytic subunit (left side – mainly Pre proteins) and the 19S regulatory subunit (right side – mainly Rpt and Rpn proteins).

Mentions: To simulate an obvious example where the directed mode of MCODE would be useful, MCODE was run with relaxed parameters (haircut = TRUE, fluff = TRUE, VWP = 0.05 and a fluff density threshold of 0.2) compared to the best parameters on the AllYeast network. The resulting fourth highest ranked complex, when visualized, shows two clustered components and represents two protein complexes, the proteasome and an RNA processing complex, both found in the nucleus (Figure 10). This is an example of where a lower VWP parameter would have been superior since it would have divided this large complex into two more functionally related complexes. The highest weighted vertices in the center of each of the two dense regions in Figure 10 are the Rpt1 and Lsm4 proteins. MCODE was run in directed mode starting with these two proteins over a range of VWP parameters from 0 to 0.2, at 0.05 increments. For Lsm4, the parameter set of haircut = TRUE, fluff = FALSE, VWP = 0 was used to find a core complex, which contained 9 proteins fully connected to each other (Dcp1, Kem1, Lsm2, Lsm3, Lsm4, Lsm5, Lsm6, Lsm7 and Pat1). Above this VWP parameter, the core complex branched out into proteasome subunit proteins, which are not part of the Lsm complex (see Figure 11A). Using this VWP parameter, combinations of haircut and fluff parameters were used to further expand the core complex. This process was stopped when the predicted complexes began to include proteins of sufficiently different known biological function to the seed vertex. Proteins, such as Vam6 and Yor320c were included in the complex at moderate fluff parameters (0.4–0.6), but not at higher fluff parameters, and these are known to be localized in membranes outside of the nucleus, thus are likely not functionally related to the Lsm complex proteins. Therefore, the 9 proteins listed above were decided to be the final complex (Figure 11B). This is intuitive because of their maximal density (a 9-clique).


An automated method for finding molecular complexes in large protein interaction networks.

Bader GD, Hogue CW - BMC Bioinformatics (2003)

MCODE in Directed Mode Figure legend: MCODE was used in directed mode to further study the complex in Figure 10 by using seed vertices from high density regions of the two parts of this complex. A) The result of examining the Lsm complex using MCODE parameters that are too relaxed (haircut = TRUE, fluff = FALSE, VWP = 0.05). B) The final Lsm complex using MCODE parameters of haircut = TRUE, fluff = FALSE and VWP = 0 seeded with Lsm4. C) The final 26S proteasome complex seeded with Rpt1 using MCODE parameters haircut = TRUE, fluff = TRUE and VWP = 0.2. Visible here are two regions of density in this complex corresponding to the 20S proteolytic subunit (left side – mainly Pre proteins) and the 19S regulatory subunit (right side – mainly Rpt and Rpn proteins).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC149346&req=5

Figure 11: MCODE in Directed Mode Figure legend: MCODE was used in directed mode to further study the complex in Figure 10 by using seed vertices from high density regions of the two parts of this complex. A) The result of examining the Lsm complex using MCODE parameters that are too relaxed (haircut = TRUE, fluff = FALSE, VWP = 0.05). B) The final Lsm complex using MCODE parameters of haircut = TRUE, fluff = FALSE and VWP = 0 seeded with Lsm4. C) The final 26S proteasome complex seeded with Rpt1 using MCODE parameters haircut = TRUE, fluff = TRUE and VWP = 0.2. Visible here are two regions of density in this complex corresponding to the 20S proteolytic subunit (left side – mainly Pre proteins) and the 19S regulatory subunit (right side – mainly Rpt and Rpn proteins).
Mentions: To simulate an obvious example where the directed mode of MCODE would be useful, MCODE was run with relaxed parameters (haircut = TRUE, fluff = TRUE, VWP = 0.05 and a fluff density threshold of 0.2) compared to the best parameters on the AllYeast network. The resulting fourth highest ranked complex, when visualized, shows two clustered components and represents two protein complexes, the proteasome and an RNA processing complex, both found in the nucleus (Figure 10). This is an example of where a lower VWP parameter would have been superior since it would have divided this large complex into two more functionally related complexes. The highest weighted vertices in the center of each of the two dense regions in Figure 10 are the Rpt1 and Lsm4 proteins. MCODE was run in directed mode starting with these two proteins over a range of VWP parameters from 0 to 0.2, at 0.05 increments. For Lsm4, the parameter set of haircut = TRUE, fluff = FALSE, VWP = 0 was used to find a core complex, which contained 9 proteins fully connected to each other (Dcp1, Kem1, Lsm2, Lsm3, Lsm4, Lsm5, Lsm6, Lsm7 and Pat1). Above this VWP parameter, the core complex branched out into proteasome subunit proteins, which are not part of the Lsm complex (see Figure 11A). Using this VWP parameter, combinations of haircut and fluff parameters were used to further expand the core complex. This process was stopped when the predicted complexes began to include proteins of sufficiently different known biological function to the seed vertex. Proteins, such as Vam6 and Yor320c were included in the complex at moderate fluff parameters (0.4–0.6), but not at higher fluff parameters, and these are known to be localized in membranes outside of the nucleus, thus are likely not functionally related to the Lsm complex proteins. Therefore, the 9 proteins listed above were decided to be the final complex (Figure 11B). This is intuitive because of their maximal density (a 9-clique).

Bottom Line: As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery.Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes.The program is available from ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE.

View Article: PubMed Central - HTML - PubMed

Affiliation: Samuel Lunenfeld Research Institute, Mt, Sinai Hospital, Toronto ON Canada M5G 1X5, Dept, of Biochemistry, University of Toronto, Toronto ON Canada M5S 1A8. gary.bader@utoronto.ca

ABSTRACT

Background: Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery.

Results: This paper describes a novel graph theoretic clustering algorithm, "Molecular Complex Detection" (MCODE), that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes. The method is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. The algorithm has the advantage over other graph clustering methods of having a directed mode that allows fine-tuning of clusters of interest without considering the rest of the network and allows examination of cluster interconnectivity, which is relevant for protein networks. Protein interaction and complex information from the yeast Saccharomyces cerevisiae was used for evaluation.

Conclusion: Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes. The algorithm is not affected by a known high rate of false positives in data from high-throughput interaction techniques. The program is available from ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE.

Show MeSH