Limits...
Modifying the DPClus algorithm for identifying protein complexes based on new topological structures.

Li M, Chen JE, Wang JX, Hu B, Chen G - BMC Bioinformatics (2008)

Bottom Line: Identification of protein complexes is crucial for understanding principles of cellular organization and functions.As the size of protein-protein interaction set increases, a general trend is to represent the interactions as a network and to develop effective algorithms to detect significant complexes in such networks.Based on the study of known complexes in protein networks, this paper proposes a new topological structure for protein complexes, which is a combination of subgraph diameter (or average vertex distance) and subgraph density.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Information Science and Engineering, Central South University, Changsha, Hunan 410083, PR China. limin@mail.csu.edu.cn

ABSTRACT

Background: Identification of protein complexes is crucial for understanding principles of cellular organization and functions. As the size of protein-protein interaction set increases, a general trend is to represent the interactions as a network and to develop effective algorithms to detect significant complexes in such networks.

Results: Based on the study of known complexes in protein networks, this paper proposes a new topological structure for protein complexes, which is a combination of subgraph diameter (or average vertex distance) and subgraph density. Following the approach of that of the previously proposed clustering algorithm DPClus which expands clusters starting from seeded vertices, we present a clustering algorithm IPCA based on the new topological structure for identifying complexes in large protein interaction networks. The algorithm IPCA is applied to the protein interaction network of Sacchromyces cerevisiae and identifies many well known complexes. Experimental results show that the algorithm IPCA recalls more known complexes than previously proposed clustering algorithms, including DPClus, CFinder, LCMA, MCODE, RNSC and STM.

Conclusion: The proposed algorithm based on the new topological structure makes it possible to identify dense subgraphs in protein interaction networks, many of which correspond to known protein complexes. The algorithm is robust to the known high rate of false positives and false negatives in data from high-throughout interaction techniques. The program is available at http://netlab.csu.edu.cn/bioinformatics/limin/IPCA.

Show MeSH
The robustness of IPCA against random edges addition and removal. (a) Various proportions of edges added to the protein interaction network randomly, (b) Various proportions of edges removed from the protein interaction network randomly.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2570695&req=5

Figure 7: The robustness of IPCA against random edges addition and removal. (a) Various proportions of edges added to the protein interaction network randomly, (b) Various proportions of edges removed from the protein interaction network randomly.

Mentions: In this analysis, we evaluated the robustness of the algorithm IPCA to various levels of graph alterations. Since all the methods of PPIs (Protein-Protein Interactions) prediction are known to yield a non-negligible amount of noise (false positives) and to miss a fraction of existing interactions (false negatives) [24], we tested the robustness of IPCA to false positive by adding edges randomly and to false negatives by removing edges randomly. Proportions of edges (0%, 10%, 20%, 30%,..., 90% and 100%) were added to the yeast protein interaction network randomly, and the same proportions (except that of 100%) of edges were removed from the yeast network randomly. It should be expected that the false positives would not randomly contribute to the formation of dense sub-graphs, and that the number of matched known complexes does not decrease fast with the increasing of false negatives. Figure 7 displays the impact of edge addition and removal on the results of the algorithm IPCA. As one can see, IPCA is barely affected by addition of up to 100% edges. It is also affected faintly by removal of up to 50% edges. It starts to drop perceivably from 60%, and a fast drop starts from 80%. However, there are still 93 known complexes matched to the predicted clusters (Tin = 0.9 and Tin = 0.6) when 80% edges are removed. The analysis strongly shows that the algorithm IPCA is very robust against the high rate of false positives and false negatives in protein interactions.


Modifying the DPClus algorithm for identifying protein complexes based on new topological structures.

Li M, Chen JE, Wang JX, Hu B, Chen G - BMC Bioinformatics (2008)

The robustness of IPCA against random edges addition and removal. (a) Various proportions of edges added to the protein interaction network randomly, (b) Various proportions of edges removed from the protein interaction network randomly.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2570695&req=5

Figure 7: The robustness of IPCA against random edges addition and removal. (a) Various proportions of edges added to the protein interaction network randomly, (b) Various proportions of edges removed from the protein interaction network randomly.
Mentions: In this analysis, we evaluated the robustness of the algorithm IPCA to various levels of graph alterations. Since all the methods of PPIs (Protein-Protein Interactions) prediction are known to yield a non-negligible amount of noise (false positives) and to miss a fraction of existing interactions (false negatives) [24], we tested the robustness of IPCA to false positive by adding edges randomly and to false negatives by removing edges randomly. Proportions of edges (0%, 10%, 20%, 30%,..., 90% and 100%) were added to the yeast protein interaction network randomly, and the same proportions (except that of 100%) of edges were removed from the yeast network randomly. It should be expected that the false positives would not randomly contribute to the formation of dense sub-graphs, and that the number of matched known complexes does not decrease fast with the increasing of false negatives. Figure 7 displays the impact of edge addition and removal on the results of the algorithm IPCA. As one can see, IPCA is barely affected by addition of up to 100% edges. It is also affected faintly by removal of up to 50% edges. It starts to drop perceivably from 60%, and a fast drop starts from 80%. However, there are still 93 known complexes matched to the predicted clusters (Tin = 0.9 and Tin = 0.6) when 80% edges are removed. The analysis strongly shows that the algorithm IPCA is very robust against the high rate of false positives and false negatives in protein interactions.

Bottom Line: Identification of protein complexes is crucial for understanding principles of cellular organization and functions.As the size of protein-protein interaction set increases, a general trend is to represent the interactions as a network and to develop effective algorithms to detect significant complexes in such networks.Based on the study of known complexes in protein networks, this paper proposes a new topological structure for protein complexes, which is a combination of subgraph diameter (or average vertex distance) and subgraph density.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Information Science and Engineering, Central South University, Changsha, Hunan 410083, PR China. limin@mail.csu.edu.cn

ABSTRACT

Background: Identification of protein complexes is crucial for understanding principles of cellular organization and functions. As the size of protein-protein interaction set increases, a general trend is to represent the interactions as a network and to develop effective algorithms to detect significant complexes in such networks.

Results: Based on the study of known complexes in protein networks, this paper proposes a new topological structure for protein complexes, which is a combination of subgraph diameter (or average vertex distance) and subgraph density. Following the approach of that of the previously proposed clustering algorithm DPClus which expands clusters starting from seeded vertices, we present a clustering algorithm IPCA based on the new topological structure for identifying complexes in large protein interaction networks. The algorithm IPCA is applied to the protein interaction network of Sacchromyces cerevisiae and identifies many well known complexes. Experimental results show that the algorithm IPCA recalls more known complexes than previously proposed clustering algorithms, including DPClus, CFinder, LCMA, MCODE, RNSC and STM.

Conclusion: The proposed algorithm based on the new topological structure makes it possible to identify dense subgraphs in protein interaction networks, many of which correspond to known protein complexes. The algorithm is robust to the known high rate of false positives and false negatives in data from high-throughout interaction techniques. The program is available at http://netlab.csu.edu.cn/bioinformatics/limin/IPCA.

Show MeSH