Limits...
Modifying the DPClus algorithm for identifying protein complexes based on new topological structures.

Li M, Chen JE, Wang JX, Hu B, Chen G - BMC Bioinformatics (2008)

Bottom Line: Identification of protein complexes is crucial for understanding principles of cellular organization and functions.As the size of protein-protein interaction set increases, a general trend is to represent the interactions as a network and to develop effective algorithms to detect significant complexes in such networks.Based on the study of known complexes in protein networks, this paper proposes a new topological structure for protein complexes, which is a combination of subgraph diameter (or average vertex distance) and subgraph density.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Information Science and Engineering, Central South University, Changsha, Hunan 410083, PR China. limin@mail.csu.edu.cn

ABSTRACT

Background: Identification of protein complexes is crucial for understanding principles of cellular organization and functions. As the size of protein-protein interaction set increases, a general trend is to represent the interactions as a network and to develop effective algorithms to detect significant complexes in such networks.

Results: Based on the study of known complexes in protein networks, this paper proposes a new topological structure for protein complexes, which is a combination of subgraph diameter (or average vertex distance) and subgraph density. Following the approach of that of the previously proposed clustering algorithm DPClus which expands clusters starting from seeded vertices, we present a clustering algorithm IPCA based on the new topological structure for identifying complexes in large protein interaction networks. The algorithm IPCA is applied to the protein interaction network of Sacchromyces cerevisiae and identifies many well known complexes. Experimental results show that the algorithm IPCA recalls more known complexes than previously proposed clustering algorithms, including DPClus, CFinder, LCMA, MCODE, RNSC and STM.

Conclusion: The proposed algorithm based on the new topological structure makes it possible to identify dense subgraphs in protein interaction networks, many of which correspond to known protein complexes. The algorithm is robust to the known high rate of false positives and false negatives in data from high-throughout interaction techniques. The program is available at http://netlab.csu.edu.cn/bioinformatics/limin/IPCA.

Show MeSH

Related in: MedlinePlus

The effect of Tin on clustering. Nine sets are generated from the yeast network by IPCA using SP ≤ 2 and Tin = 0.1, 0.2,...,0.9, and nine sets are generated by IPCA using ASP ≤ 2 and Tin = 0.1, 0.2,...,0.9. (a) the total number of predicted clusters, (b) the number of the predicted clusters with size > 2, (c)size of the biggest predicted cluster, (d) The average size of the predicted clusters.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2570695&req=5

Figure 3: The effect of Tin on clustering. Nine sets are generated from the yeast network by IPCA using SP ≤ 2 and Tin = 0.1, 0.2,...,0.9, and nine sets are generated by IPCA using ASP ≤ 2 and Tin = 0.1, 0.2,...,0.9. (a) the total number of predicted clusters, (b) the number of the predicted clusters with size > 2, (c)size of the biggest predicted cluster, (d) The average size of the predicted clusters.

Mentions: To understand how the value of Tin influences the outcome of the clustering, we generate 18 sets of clusters by using SP ≤ 2 and ASP ≤ 2 with Tin = 0.1, 0.2,..., 0.9 from the protein interaction network of yeast. The effect on the predicted clusters with different Tin is given in Figure 3. Figure 3(a) shows that the total number of the predicted clusters is increasing as Tin increases. However, in Figure 3(b), there is a abrupt decrease at Tin = 0.5. This is probably caused by the Hub structures in the protein interaction network. When Tin = 0.5, these Hub structures are decomposed into complexes that consist of only 2 proteins.


Modifying the DPClus algorithm for identifying protein complexes based on new topological structures.

Li M, Chen JE, Wang JX, Hu B, Chen G - BMC Bioinformatics (2008)

The effect of Tin on clustering. Nine sets are generated from the yeast network by IPCA using SP ≤ 2 and Tin = 0.1, 0.2,...,0.9, and nine sets are generated by IPCA using ASP ≤ 2 and Tin = 0.1, 0.2,...,0.9. (a) the total number of predicted clusters, (b) the number of the predicted clusters with size > 2, (c)size of the biggest predicted cluster, (d) The average size of the predicted clusters.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2570695&req=5

Figure 3: The effect of Tin on clustering. Nine sets are generated from the yeast network by IPCA using SP ≤ 2 and Tin = 0.1, 0.2,...,0.9, and nine sets are generated by IPCA using ASP ≤ 2 and Tin = 0.1, 0.2,...,0.9. (a) the total number of predicted clusters, (b) the number of the predicted clusters with size > 2, (c)size of the biggest predicted cluster, (d) The average size of the predicted clusters.
Mentions: To understand how the value of Tin influences the outcome of the clustering, we generate 18 sets of clusters by using SP ≤ 2 and ASP ≤ 2 with Tin = 0.1, 0.2,..., 0.9 from the protein interaction network of yeast. The effect on the predicted clusters with different Tin is given in Figure 3. Figure 3(a) shows that the total number of the predicted clusters is increasing as Tin increases. However, in Figure 3(b), there is a abrupt decrease at Tin = 0.5. This is probably caused by the Hub structures in the protein interaction network. When Tin = 0.5, these Hub structures are decomposed into complexes that consist of only 2 proteins.

Bottom Line: Identification of protein complexes is crucial for understanding principles of cellular organization and functions.As the size of protein-protein interaction set increases, a general trend is to represent the interactions as a network and to develop effective algorithms to detect significant complexes in such networks.Based on the study of known complexes in protein networks, this paper proposes a new topological structure for protein complexes, which is a combination of subgraph diameter (or average vertex distance) and subgraph density.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Information Science and Engineering, Central South University, Changsha, Hunan 410083, PR China. limin@mail.csu.edu.cn

ABSTRACT

Background: Identification of protein complexes is crucial for understanding principles of cellular organization and functions. As the size of protein-protein interaction set increases, a general trend is to represent the interactions as a network and to develop effective algorithms to detect significant complexes in such networks.

Results: Based on the study of known complexes in protein networks, this paper proposes a new topological structure for protein complexes, which is a combination of subgraph diameter (or average vertex distance) and subgraph density. Following the approach of that of the previously proposed clustering algorithm DPClus which expands clusters starting from seeded vertices, we present a clustering algorithm IPCA based on the new topological structure for identifying complexes in large protein interaction networks. The algorithm IPCA is applied to the protein interaction network of Sacchromyces cerevisiae and identifies many well known complexes. Experimental results show that the algorithm IPCA recalls more known complexes than previously proposed clustering algorithms, including DPClus, CFinder, LCMA, MCODE, RNSC and STM.

Conclusion: The proposed algorithm based on the new topological structure makes it possible to identify dense subgraphs in protein interaction networks, many of which correspond to known protein complexes. The algorithm is robust to the known high rate of false positives and false negatives in data from high-throughout interaction techniques. The program is available at http://netlab.csu.edu.cn/bioinformatics/limin/IPCA.

Show MeSH
Related in: MedlinePlus