Limits...
An automated method for finding molecular complexes in large protein interaction networks.

Bader GD, Hogue CW - BMC Bioinformatics (2003)

Bottom Line: As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery.Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes.The program is available from ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE.

View Article: PubMed Central - HTML - PubMed

Affiliation: Samuel Lunenfeld Research Institute, Mt, Sinai Hospital, Toronto ON Canada M5G 1X5, Dept, of Biochemistry, University of Toronto, Toronto ON Canada M5S 1A8. gary.bader@utoronto.ca

ABSTRACT

Background: Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery.

Results: This paper describes a novel graph theoretic clustering algorithm, "Molecular Complex Detection" (MCODE), that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes. The method is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. The algorithm has the advantage over other graph clustering methods of having a directed mode that allows fine-tuning of clusters of interest without considering the rest of the network and allows examination of cluster interconnectivity, which is relevant for protein networks. Protein interaction and complex information from the yeast Saccharomyces cerevisiae was used for evaluation.

Conclusion: Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes. The algorithm is not affected by a known high rate of false positives in data from high-throughput interaction techniques. The program is available from ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE.

Show MeSH
Effect of Overlap Score Threshold on Number of Predicted and Matched Known Complexes for the Gavin Evaluation Figure legend: Average and maximum number of predicted and matched known complexes seen during MCODE parameter optimization (840 parameter combinations) plotted as a function of overlap score threshold. As the stringency for the closeness that a predicted complex must match a known complex is increased (increase in overlap score), fewer predicted complexes match known complexes. Note that these curves do not correspond to the best parameter set, but rather are an average of results from all tried parameter combinations.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC149346&req=5

Figure 1: Effect of Overlap Score Threshold on Number of Predicted and Matched Known Complexes for the Gavin Evaluation Figure legend: Average and maximum number of predicted and matched known complexes seen during MCODE parameter optimization (840 parameter combinations) plotted as a function of overlap score threshold. As the stringency for the closeness that a predicted complex must match a known complex is increased (increase in overlap score), fewer predicted complexes match known complexes. Note that these curves do not correspond to the best parameter set, but rather are an average of results from all tried parameter combinations.

Mentions: To choose an overlap score that maximizes biological relevance of the predicted complexes without filtering away too many predictions, each of the 840 parameter combinations tested during the parameter optimization stage. The number of MCODE predicted complexes was plotted against the number of matched known complexes over a range of ω thresholds from 'no threshold' to 0.1 to 0.9 (in 0.1 increments). If no ω threshold is used, a predicted complex only needs at least one protein in common with a known complex to be considered a match. If predicted and known complexes are only counted as a match when their ω is above a specific threshold, the number of matched complexes declines with increasing ω threshold, as shown in Figure 1. Interestingly, the average and maximum number of matched known complexes drops more quickly from zero until a ω threshold of 0.2 than from 0.2 to 0.9 indicating that many predicted complexes only have one or a few proteins that overlap with known complexes. A ω threshold of 0.2 to 0.3 thus seems to filter out most predicted complexes that have insignificant overlap with known complexes.


An automated method for finding molecular complexes in large protein interaction networks.

Bader GD, Hogue CW - BMC Bioinformatics (2003)

Effect of Overlap Score Threshold on Number of Predicted and Matched Known Complexes for the Gavin Evaluation Figure legend: Average and maximum number of predicted and matched known complexes seen during MCODE parameter optimization (840 parameter combinations) plotted as a function of overlap score threshold. As the stringency for the closeness that a predicted complex must match a known complex is increased (increase in overlap score), fewer predicted complexes match known complexes. Note that these curves do not correspond to the best parameter set, but rather are an average of results from all tried parameter combinations.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC149346&req=5

Figure 1: Effect of Overlap Score Threshold on Number of Predicted and Matched Known Complexes for the Gavin Evaluation Figure legend: Average and maximum number of predicted and matched known complexes seen during MCODE parameter optimization (840 parameter combinations) plotted as a function of overlap score threshold. As the stringency for the closeness that a predicted complex must match a known complex is increased (increase in overlap score), fewer predicted complexes match known complexes. Note that these curves do not correspond to the best parameter set, but rather are an average of results from all tried parameter combinations.
Mentions: To choose an overlap score that maximizes biological relevance of the predicted complexes without filtering away too many predictions, each of the 840 parameter combinations tested during the parameter optimization stage. The number of MCODE predicted complexes was plotted against the number of matched known complexes over a range of ω thresholds from 'no threshold' to 0.1 to 0.9 (in 0.1 increments). If no ω threshold is used, a predicted complex only needs at least one protein in common with a known complex to be considered a match. If predicted and known complexes are only counted as a match when their ω is above a specific threshold, the number of matched complexes declines with increasing ω threshold, as shown in Figure 1. Interestingly, the average and maximum number of matched known complexes drops more quickly from zero until a ω threshold of 0.2 than from 0.2 to 0.9 indicating that many predicted complexes only have one or a few proteins that overlap with known complexes. A ω threshold of 0.2 to 0.3 thus seems to filter out most predicted complexes that have insignificant overlap with known complexes.

Bottom Line: As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery.Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes.The program is available from ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE.

View Article: PubMed Central - HTML - PubMed

Affiliation: Samuel Lunenfeld Research Institute, Mt, Sinai Hospital, Toronto ON Canada M5G 1X5, Dept, of Biochemistry, University of Toronto, Toronto ON Canada M5S 1A8. gary.bader@utoronto.ca

ABSTRACT

Background: Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery.

Results: This paper describes a novel graph theoretic clustering algorithm, "Molecular Complex Detection" (MCODE), that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes. The method is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. The algorithm has the advantage over other graph clustering methods of having a directed mode that allows fine-tuning of clusters of interest without considering the rest of the network and allows examination of cluster interconnectivity, which is relevant for protein networks. Protein interaction and complex information from the yeast Saccharomyces cerevisiae was used for evaluation.

Conclusion: Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes. The algorithm is not affected by a known high rate of false positives in data from high-throughput interaction techniques. The program is available from ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE.

Show MeSH