Limits...
RRW: repeated random walks on genome-scale protein networks for local cluster discovery.

Macropol K, Can T, Singh AK - BMC Bioinformatics (2009)

Bottom Line: We apply the proposed technique on a functional network of yeast genes and accurately identify statistically significant clusters of proteins.RRW, which is a technique that exploits the topology of the network, is more precise and robust in finding local clusters.In addition, it has the added flexibility of being able to find multi-functional proteins by allowing overlapping clusters.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of California, Santa Barbara, CA 93106, USA. kpm@cs.ucsb.edu

ABSTRACT

Background: We propose an efficient and biologically sensitive algorithm based on repeated random walks (RRW) for discovering functional modules, e.g., complexes and pathways, within large-scale protein networks. Compared to existing cluster identification techniques, RRW implicitly makes use of network topology, edge weights, and long range interactions between proteins.

Results: We apply the proposed technique on a functional network of yeast genes and accurately identify statistically significant clusters of proteins. We validate the biological significance of the results using known complexes in the MIPS complex catalogue database and well-characterized biological processes. We find that 90% of the created clusters have the majority of their catalogued proteins belonging to the same MIPS complex, and about 80% have the majority of their proteins involved in the same biological process. We compare our method to various other clustering techniques, such as the Markov Clustering Algorithm (MCL), and find a significant improvement in the RRW clusters' precision and accuracy values.

Conclusion: RRW, which is a technique that exploits the topology of the network, is more precise and robust in finding local clusters. In addition, it has the added flexibility of being able to find multi-functional proteins by allowing overlapping clusters.

Show MeSH

Related in: MedlinePlus

Random walk from a cluster algorithm. Pseudocode for the algorithm used to simulate a random walk with restarts from a cluster of vertices.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2748087&req=5

Figure 3: Random walk from a cluster algorithm. Pseudocode for the algorithm used to simulate a random walk with restarts from a cluster of vertices.

Mentions: A sketch of the repeated random walk (RRW) and ClusterRWSimulation algorithms is given in the Methods Section (Figures 2 and 3). Starting from every node in the network, the RandomWalk method is run, and the resulting affinity vectors associated with each single node are saved. Sets of strongly connected proteins are then found by again starting from every node in the network and expanding the clusters repeatedly using the ClusterRWSimulation method. This method utilizes the vectors found in the RandomWalk method to quickly obtain the random walk affinity vectors, and the closest protein to the current cluster is found. This protein is added to the cluster, its score remembered, and resulting in a new cluster to be further expanded. This process is continued until either the next protein to be added's score is not within a given percentage, λ (the early cutoff), of the previously added protein's score, or we reach the maximum cluster size k. All clusters created during expansion are saved.


RRW: repeated random walks on genome-scale protein networks for local cluster discovery.

Macropol K, Can T, Singh AK - BMC Bioinformatics (2009)

Random walk from a cluster algorithm. Pseudocode for the algorithm used to simulate a random walk with restarts from a cluster of vertices.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2748087&req=5

Figure 3: Random walk from a cluster algorithm. Pseudocode for the algorithm used to simulate a random walk with restarts from a cluster of vertices.
Mentions: A sketch of the repeated random walk (RRW) and ClusterRWSimulation algorithms is given in the Methods Section (Figures 2 and 3). Starting from every node in the network, the RandomWalk method is run, and the resulting affinity vectors associated with each single node are saved. Sets of strongly connected proteins are then found by again starting from every node in the network and expanding the clusters repeatedly using the ClusterRWSimulation method. This method utilizes the vectors found in the RandomWalk method to quickly obtain the random walk affinity vectors, and the closest protein to the current cluster is found. This protein is added to the cluster, its score remembered, and resulting in a new cluster to be further expanded. This process is continued until either the next protein to be added's score is not within a given percentage, λ (the early cutoff), of the previously added protein's score, or we reach the maximum cluster size k. All clusters created during expansion are saved.

Bottom Line: We apply the proposed technique on a functional network of yeast genes and accurately identify statistically significant clusters of proteins.RRW, which is a technique that exploits the topology of the network, is more precise and robust in finding local clusters.In addition, it has the added flexibility of being able to find multi-functional proteins by allowing overlapping clusters.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of California, Santa Barbara, CA 93106, USA. kpm@cs.ucsb.edu

ABSTRACT

Background: We propose an efficient and biologically sensitive algorithm based on repeated random walks (RRW) for discovering functional modules, e.g., complexes and pathways, within large-scale protein networks. Compared to existing cluster identification techniques, RRW implicitly makes use of network topology, edge weights, and long range interactions between proteins.

Results: We apply the proposed technique on a functional network of yeast genes and accurately identify statistically significant clusters of proteins. We validate the biological significance of the results using known complexes in the MIPS complex catalogue database and well-characterized biological processes. We find that 90% of the created clusters have the majority of their catalogued proteins belonging to the same MIPS complex, and about 80% have the majority of their proteins involved in the same biological process. We compare our method to various other clustering techniques, such as the Markov Clustering Algorithm (MCL), and find a significant improvement in the RRW clusters' precision and accuracy values.

Conclusion: RRW, which is a technique that exploits the topology of the network, is more precise and robust in finding local clusters. In addition, it has the added flexibility of being able to find multi-functional proteins by allowing overlapping clusters.

Show MeSH
Related in: MedlinePlus