Limits...
Identifying influential nodes in large-scale directed networks: the role of clustering.

Chen DB, Gao H, Lü L, Zhou T - PLoS ONE (2013)

Bottom Line: Inspired by these facts, we propose a local ranking algorithm named ClusterRank, which takes into account not only the number of neighbors and the neighbors' influences, but also the clustering coefficient.Subject to the susceptible-infected-recovered (SIR) spreading model with constant infectivity, experimental results on two directed networks, a social network extracted from delicious.com and a large-scale short-message communication network, demonstrate that the ClusterRank outperforms some benchmark algorithms such as PageRank and LeaderRank.Furthermore, ClusterRank can also be applied to undirected networks where the superiority of ClusterRank is significant compared with degree centrality and k-core decomposition.

View Article: PubMed Central - PubMed

Affiliation: Web Sciences Center, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, People's Republic of China ; Department of Physics, University of Fribourg, Fribourg, Switzerland.

ABSTRACT
Identifying influential nodes in very large-scale directed networks is a big challenge relevant to disparate applications, such as accelerating information propagation, controlling rumors and diseases, designing search engines, and understanding hierarchical organization of social and biological networks. Known methods range from node centralities, such as degree, closeness and betweenness, to diffusion-based processes, like PageRank and LeaderRank. Some of these methods already take into account the influences of a node's neighbors but do not directly make use of the interactions among it's neighbors. Local clustering is known to have negative impacts on the information spreading. We further show empirically that it also plays a negative role in generating local connections. Inspired by these facts, we propose a local ranking algorithm named ClusterRank, which takes into account not only the number of neighbors and the neighbors' influences, but also the clustering coefficient. Subject to the susceptible-infected-recovered (SIR) spreading model with constant infectivity, experimental results on two directed networks, a social network extracted from delicious.com and a large-scale short-message communication network, demonstrate that the ClusterRank outperforms some benchmark algorithms such as PageRank and LeaderRank. Furthermore, ClusterRank can also be applied to undirected networks where the superiority of ClusterRank is significant compared with degree centrality and k-core decomposition. In addition, ClusterRank, only making use of local information, is much more efficient than global methods: It takes only 191 seconds for a network with about [Formula: see text] nodes, more than 15 times faster than PageRank.

Show MeSH

Related in: MedlinePlus

The dependance of  on parameter  in undirected Delicious and Cond-mat networks. We set .In (a) and (c), the initial infected nodes are those non-overlapped nodes in the top-50 places regardless of whether they are connected or not. In (b) and (d), the initial infected nodes are the non-overlapped nodes in top-50 places under constraint that any two of them are not connected. Each data point is obtained by averaging over 100 independent runs.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3814409&req=5

pone-0077455-g008: The dependance of on parameter in undirected Delicious and Cond-mat networks. We set .In (a) and (c), the initial infected nodes are those non-overlapped nodes in the top-50 places regardless of whether they are connected or not. In (b) and (d), the initial infected nodes are the non-overlapped nodes in top-50 places under constraint that any two of them are not connected. Each data point is obtained by averaging over 100 independent runs.

Mentions: Figure 8 shows the dependence of on for the undirected Delicious network and Cond-mat network, where is the ratio of the total number of infected and recovered nodes to all nodes at time for ClusterRank, and stands for the corresponding quantity of degree centrality or k-core decomposition. For the first case, see figures 8(a) and 8(c), the eventually infected size of ClusterRank is larger than that of degree centrality and k-core decomposition. In DeliciousUN, the largest value for k-core decomposition is 3.97 which is about 2.5 times larger than that for degree centrality. This reminds us that as a group of initial infected nodes, k-core decomposition may perform even worse than degree centrality [6], since the selected nodes identified by k-core decomposition are usually in the same core and thus densely connected with each other while the nodes selected by degree centrality or ClusterRank are usually located at different cores and thus sparsely connected. Apparently, ClusterRank is much more advanced than degree centrality. Similar results are also found in Cond-mat network, see figure 8(c). Note that, Cond-mat network is highly clustered with clustering coefficient , because there are many cliques each of which is constituted by a group of co-authors of a paper. Therefore the authors whose collaborators closely collaborate with each other will be highly depressed by ClusterRank due to their high clustering coefficients. The researcher with diverse collaborators who are usually belong to different communities will be more influential than those who only collaborates with people in one community. For the second case, with the consideration of the nodes that are not directly connected with each other the performance of -core decomposition is improved. Specifically, in DeliciousUN, ClusterRank performs much better than degree centrality especially for the middle region of and better than that of k-core decomposition for . In Cond-mat network, the results of ClusterRank are still better than degree centrality and -core decomposition in the middle region of , and for other region, their performances are comparable. The investigations for very small or very large infected probability are meaningless. When is too small (e.g., ), it will be hardly spread out from any group of initial nodes, and for large , most of the nodes will get infected and thus the difference resulted from initialization will become less significant. The results shown in figure 8 demonstrate that ClusterRank also performs better than degree centrality and k-core decomposition in undirected networks.


Identifying influential nodes in large-scale directed networks: the role of clustering.

Chen DB, Gao H, Lü L, Zhou T - PLoS ONE (2013)

The dependance of  on parameter  in undirected Delicious and Cond-mat networks. We set .In (a) and (c), the initial infected nodes are those non-overlapped nodes in the top-50 places regardless of whether they are connected or not. In (b) and (d), the initial infected nodes are the non-overlapped nodes in top-50 places under constraint that any two of them are not connected. Each data point is obtained by averaging over 100 independent runs.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3814409&req=5

pone-0077455-g008: The dependance of on parameter in undirected Delicious and Cond-mat networks. We set .In (a) and (c), the initial infected nodes are those non-overlapped nodes in the top-50 places regardless of whether they are connected or not. In (b) and (d), the initial infected nodes are the non-overlapped nodes in top-50 places under constraint that any two of them are not connected. Each data point is obtained by averaging over 100 independent runs.
Mentions: Figure 8 shows the dependence of on for the undirected Delicious network and Cond-mat network, where is the ratio of the total number of infected and recovered nodes to all nodes at time for ClusterRank, and stands for the corresponding quantity of degree centrality or k-core decomposition. For the first case, see figures 8(a) and 8(c), the eventually infected size of ClusterRank is larger than that of degree centrality and k-core decomposition. In DeliciousUN, the largest value for k-core decomposition is 3.97 which is about 2.5 times larger than that for degree centrality. This reminds us that as a group of initial infected nodes, k-core decomposition may perform even worse than degree centrality [6], since the selected nodes identified by k-core decomposition are usually in the same core and thus densely connected with each other while the nodes selected by degree centrality or ClusterRank are usually located at different cores and thus sparsely connected. Apparently, ClusterRank is much more advanced than degree centrality. Similar results are also found in Cond-mat network, see figure 8(c). Note that, Cond-mat network is highly clustered with clustering coefficient , because there are many cliques each of which is constituted by a group of co-authors of a paper. Therefore the authors whose collaborators closely collaborate with each other will be highly depressed by ClusterRank due to their high clustering coefficients. The researcher with diverse collaborators who are usually belong to different communities will be more influential than those who only collaborates with people in one community. For the second case, with the consideration of the nodes that are not directly connected with each other the performance of -core decomposition is improved. Specifically, in DeliciousUN, ClusterRank performs much better than degree centrality especially for the middle region of and better than that of k-core decomposition for . In Cond-mat network, the results of ClusterRank are still better than degree centrality and -core decomposition in the middle region of , and for other region, their performances are comparable. The investigations for very small or very large infected probability are meaningless. When is too small (e.g., ), it will be hardly spread out from any group of initial nodes, and for large , most of the nodes will get infected and thus the difference resulted from initialization will become less significant. The results shown in figure 8 demonstrate that ClusterRank also performs better than degree centrality and k-core decomposition in undirected networks.

Bottom Line: Inspired by these facts, we propose a local ranking algorithm named ClusterRank, which takes into account not only the number of neighbors and the neighbors' influences, but also the clustering coefficient.Subject to the susceptible-infected-recovered (SIR) spreading model with constant infectivity, experimental results on two directed networks, a social network extracted from delicious.com and a large-scale short-message communication network, demonstrate that the ClusterRank outperforms some benchmark algorithms such as PageRank and LeaderRank.Furthermore, ClusterRank can also be applied to undirected networks where the superiority of ClusterRank is significant compared with degree centrality and k-core decomposition.

View Article: PubMed Central - PubMed

Affiliation: Web Sciences Center, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, People's Republic of China ; Department of Physics, University of Fribourg, Fribourg, Switzerland.

ABSTRACT
Identifying influential nodes in very large-scale directed networks is a big challenge relevant to disparate applications, such as accelerating information propagation, controlling rumors and diseases, designing search engines, and understanding hierarchical organization of social and biological networks. Known methods range from node centralities, such as degree, closeness and betweenness, to diffusion-based processes, like PageRank and LeaderRank. Some of these methods already take into account the influences of a node's neighbors but do not directly make use of the interactions among it's neighbors. Local clustering is known to have negative impacts on the information spreading. We further show empirically that it also plays a negative role in generating local connections. Inspired by these facts, we propose a local ranking algorithm named ClusterRank, which takes into account not only the number of neighbors and the neighbors' influences, but also the clustering coefficient. Subject to the susceptible-infected-recovered (SIR) spreading model with constant infectivity, experimental results on two directed networks, a social network extracted from delicious.com and a large-scale short-message communication network, demonstrate that the ClusterRank outperforms some benchmark algorithms such as PageRank and LeaderRank. Furthermore, ClusterRank can also be applied to undirected networks where the superiority of ClusterRank is significant compared with degree centrality and k-core decomposition. In addition, ClusterRank, only making use of local information, is much more efficient than global methods: It takes only 191 seconds for a network with about [Formula: see text] nodes, more than 15 times faster than PageRank.

Show MeSH
Related in: MedlinePlus