Modifying the DPClus algorithm for identifying protein complexes based on new topological structures.
Bottom Line:
Identification of protein complexes is crucial for understanding principles of cellular organization and functions.As the size of protein-protein interaction set increases, a general trend is to represent the interactions as a network and to develop effective algorithms to detect significant complexes in such networks.Based on the study of known complexes in protein networks, this paper proposes a new topological structure for protein complexes, which is a combination of subgraph diameter (or average vertex distance) and subgraph density.
Affiliation: School of Information Science and Engineering, Central South University, Changsha, Hunan 410083, PR China. limin@mail.csu.edu.cn
ABSTRACT
Show MeSH
Background: Identification of protein complexes is crucial for understanding principles of cellular organization and functions. As the size of protein-protein interaction set increases, a general trend is to represent the interactions as a network and to develop effective algorithms to detect significant complexes in such networks. Results: Based on the study of known complexes in protein networks, this paper proposes a new topological structure for protein complexes, which is a combination of subgraph diameter (or average vertex distance) and subgraph density. Following the approach of that of the previously proposed clustering algorithm DPClus which expands clusters starting from seeded vertices, we present a clustering algorithm IPCA based on the new topological structure for identifying complexes in large protein interaction networks. The algorithm IPCA is applied to the protein interaction network of Sacchromyces cerevisiae and identifies many well known complexes. Experimental results show that the algorithm IPCA recalls more known complexes than previously proposed clustering algorithms, including DPClus, CFinder, LCMA, MCODE, RNSC and STM. Conclusion: The proposed algorithm based on the new topological structure makes it possible to identify dense subgraphs in protein interaction networks, many of which correspond to known protein complexes. The algorithm is robust to the known high rate of false positives and false negatives in data from high-throughout interaction techniques. The program is available at http://netlab.csu.edu.cn/bioinformatics/limin/IPCA. |
Related In:
Results -
Collection
License getmorefigures.php?uid=PMC2570695&req=5
Mentions: To evaluate whether the clusters generated by the algorithm IPCA from the protein interaction network are biologically significant, we experiment the algorithm on the protein interaction network of yeast and on a random network of a similar structure. The random network, which has the same size and the same degree distribution as the yeast network, is obtained by shu2ing the edges between vertices in the yeast network. More clusters are generated from the random network than from the yeast network, and the clusters generated from the random network have less proteins than those generated from the yeast network. Figure 5 shows the size distributions of the clusters generated by IPCA using Tin = 0.6 from the yeast network and from the random network. As shown in the Figure, the predicted clusters identified in the yeast network are in various sizes from 2 to 25, while those in the random network are in various size from 2 to 10. Many small clusters are detected in the random network. To evaluate whether all these small clusters in the random network are significant, we compare them with the known complexes. As shown in Figure 6, while there are more than 100 known complexes matched by the predicted clusters identified in the yeast network when the overlapping score threshold is larger than 0.2, there are almost no known complexes matched by the predicted clusters identified in the random network when the overlapping score threshold is larger than 0.2. This result shows that the random network destroys the biological intrinsic character in the protein interaction network, though it has the same degree distribution as the original yeast network. |
View Article: PubMed Central - HTML - PubMed
Affiliation: School of Information Science and Engineering, Central South University, Changsha, Hunan 410083, PR China. limin@mail.csu.edu.cn
Background: Identification of protein complexes is crucial for understanding principles of cellular organization and functions. As the size of protein-protein interaction set increases, a general trend is to represent the interactions as a network and to develop effective algorithms to detect significant complexes in such networks.
Results: Based on the study of known complexes in protein networks, this paper proposes a new topological structure for protein complexes, which is a combination of subgraph diameter (or average vertex distance) and subgraph density. Following the approach of that of the previously proposed clustering algorithm DPClus which expands clusters starting from seeded vertices, we present a clustering algorithm IPCA based on the new topological structure for identifying complexes in large protein interaction networks. The algorithm IPCA is applied to the protein interaction network of Sacchromyces cerevisiae and identifies many well known complexes. Experimental results show that the algorithm IPCA recalls more known complexes than previously proposed clustering algorithms, including DPClus, CFinder, LCMA, MCODE, RNSC and STM.
Conclusion: The proposed algorithm based on the new topological structure makes it possible to identify dense subgraphs in protein interaction networks, many of which correspond to known protein complexes. The algorithm is robust to the known high rate of false positives and false negatives in data from high-throughout interaction techniques. The program is available at http://netlab.csu.edu.cn/bioinformatics/limin/IPCA.