Limits...
An automated method for finding molecular complexes in large protein interaction networks.

Bader GD, Hogue CW - BMC Bioinformatics (2003)

Bottom Line: As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery.Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes.The program is available from ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE.

View Article: PubMed Central - HTML - PubMed

Affiliation: Samuel Lunenfeld Research Institute, Mt, Sinai Hospital, Toronto ON Canada M5G 1X5, Dept, of Biochemistry, University of Toronto, Toronto ON Canada M5S 1A8. gary.bader@utoronto.ca

ABSTRACT

Background: Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery.

Results: This paper describes a novel graph theoretic clustering algorithm, "Molecular Complex Detection" (MCODE), that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes. The method is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. The algorithm has the advantage over other graph clustering methods of having a directed mode that allows fine-tuning of clusters of interest without considering the rest of the network and allows examination of cluster interconnectivity, which is relevant for protein networks. Protein interaction and complex information from the yeast Saccharomyces cerevisiae was used for evaluation.

Conclusion: Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes. The algorithm is not affected by a known high rate of false positives in data from high-throughput interaction techniques. The program is available from ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE.

Show MeSH

Related in: MedlinePlus

Effect of Complex Score Threshold on MCODE Prediction Accuracy Figure legend: MCODE complexes equal to or greater than a specific score were compared to a benchmark comprising the combined MIPS and Gavin benchmarks. Accuracy was calculated as the number of known complexes better or equal to the threshold score divided by the total number of predicted complexes (matching and non-matching) at that threshold. A complex was deemed to match a known complex if it had an overlap score above 0.2. The number of predicted complexes that matched known complexes at each score threshold is shown as labels on the plot.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC149346&req=5

Figure 9: Effect of Complex Score Threshold on MCODE Prediction Accuracy Figure legend: MCODE complexes equal to or greater than a specific score were compared to a benchmark comprising the combined MIPS and Gavin benchmarks. Accuracy was calculated as the number of known complexes better or equal to the threshold score divided by the total number of predicted complexes (matching and non-matching) at that threshold. A complex was deemed to match a known complex if it had an overlap score above 0.2. The number of predicted complexes that matched known complexes at each score threshold is shown as labels on the plot.

Mentions: To evaluate the effectiveness of our scoring scheme, which scores larger, more dense complexes higher than smaller, more sparse complexes, we examined the accuracy of MCODE predictions at various score thresholds. As the score threshold for inclusion of complexes is increased, less complexes are included, but a higher percentage of the included complexes match complexes in the benchmark. This is at the expense of sensitivity as many benchmark matching complexes are not included at higher score thresholds (Figure 9). For example, of the ten predicted complexes with MCODE score greater or equal to six, nine match a known complex in either the MIPS or Gavin benchmark above a 0.2 threshold overlap score, yielding an accuracy of 90%. 100% of the five complexes that had an MCODE score better or equal to seven matched known complexes. Thus, complexes that score highly on our simple density based scoring scheme are very likely to be real.


An automated method for finding molecular complexes in large protein interaction networks.

Bader GD, Hogue CW - BMC Bioinformatics (2003)

Effect of Complex Score Threshold on MCODE Prediction Accuracy Figure legend: MCODE complexes equal to or greater than a specific score were compared to a benchmark comprising the combined MIPS and Gavin benchmarks. Accuracy was calculated as the number of known complexes better or equal to the threshold score divided by the total number of predicted complexes (matching and non-matching) at that threshold. A complex was deemed to match a known complex if it had an overlap score above 0.2. The number of predicted complexes that matched known complexes at each score threshold is shown as labels on the plot.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC149346&req=5

Figure 9: Effect of Complex Score Threshold on MCODE Prediction Accuracy Figure legend: MCODE complexes equal to or greater than a specific score were compared to a benchmark comprising the combined MIPS and Gavin benchmarks. Accuracy was calculated as the number of known complexes better or equal to the threshold score divided by the total number of predicted complexes (matching and non-matching) at that threshold. A complex was deemed to match a known complex if it had an overlap score above 0.2. The number of predicted complexes that matched known complexes at each score threshold is shown as labels on the plot.
Mentions: To evaluate the effectiveness of our scoring scheme, which scores larger, more dense complexes higher than smaller, more sparse complexes, we examined the accuracy of MCODE predictions at various score thresholds. As the score threshold for inclusion of complexes is increased, less complexes are included, but a higher percentage of the included complexes match complexes in the benchmark. This is at the expense of sensitivity as many benchmark matching complexes are not included at higher score thresholds (Figure 9). For example, of the ten predicted complexes with MCODE score greater or equal to six, nine match a known complex in either the MIPS or Gavin benchmark above a 0.2 threshold overlap score, yielding an accuracy of 90%. 100% of the five complexes that had an MCODE score better or equal to seven matched known complexes. Thus, complexes that score highly on our simple density based scoring scheme are very likely to be real.

Bottom Line: As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery.Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes.The program is available from ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE.

View Article: PubMed Central - HTML - PubMed

Affiliation: Samuel Lunenfeld Research Institute, Mt, Sinai Hospital, Toronto ON Canada M5G 1X5, Dept, of Biochemistry, University of Toronto, Toronto ON Canada M5S 1A8. gary.bader@utoronto.ca

ABSTRACT

Background: Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery.

Results: This paper describes a novel graph theoretic clustering algorithm, "Molecular Complex Detection" (MCODE), that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes. The method is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. The algorithm has the advantage over other graph clustering methods of having a directed mode that allows fine-tuning of clusters of interest without considering the rest of the network and allows examination of cluster interconnectivity, which is relevant for protein networks. Protein interaction and complex information from the yeast Saccharomyces cerevisiae was used for evaluation.

Conclusion: Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes. The algorithm is not affected by a known high rate of false positives in data from high-throughput interaction techniques. The program is available from ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE.

Show MeSH
Related in: MedlinePlus