Limits...
Exploring the limits of community detection strategies in complex networks.

Aldecoa R, Marín I - Sci Rep (2013)

Bottom Line: The characterization of network community structure has profound implications in several scientific areas.Surprise, a global parameter that evaluates the quality of a partition, confirms the power of these analyses.We show that none of the community detection algorithms tested provide consistently optimal results in all networks and that Surprise maximization, obtained by combining multiple algorithms, obtains quasi-optimal performances in these difficult benchmarks.

View Article: PubMed Central - PubMed

Affiliation: Instituto de Biomedicina de Valencia, Consejo Superior de Investigaciones Científicas, Calle Jaime Roig 11, 46010, Valencia, Spain.

ABSTRACT
The characterization of network community structure has profound implications in several scientific areas. Therefore, testing the algorithms developed to establish the optimal division of a network into communities is a fundamental problem in the field. We performed here a highly detailed evaluation of community detection algorithms, which has two main novelties: 1) using complex closed benchmarks, which provide precise ways to assess whether the solutions generated by the algorithms are optimal; and, 2) A novel type of analysis, based on hierarchically clustering the solutions suggested by multiple community detection algorithms, which allows to easily visualize how different are those solutions. Surprise, a global parameter that evaluates the quality of a partition, confirms the power of these analyses. We show that none of the community detection algorithms tested provide consistently optimal results in all networks and that Surprise maximization, obtained by combining multiple algorithms, obtains quasi-optimal performances in these difficult benchmarks.

No MeSH data available.


Details of the performance of the best algorithms in LFR benchmarks.The y-axis (VIδ) corresponds to the difference between the expected value, VIIF and the VIIE + VIEF value of the different solutions. VIδ values close to zero correspond to the best performers.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3713530&req=5

f5: Details of the performance of the best algorithms in LFR benchmarks.The y-axis (VIδ) corresponds to the difference between the expected value, VIIF and the VIIE + VIEF value of the different solutions. VIδ values close to zero correspond to the best performers.

Mentions: Figures 5 and 6 show in more detail the deviations from the optimal values, indicated as VIδ = VIIF − (VIIE + VIEF), of the six best algorithms of each benchmark. This value is equal to 0 when agreement with the expected optimal performance is perfect. The larger the deviations from that optimal behavior, the more negative are the values of VIδ. In the LFR benchmarks (Fig. 5), we confirmed that Infomap outperformed the other five algorithms. Its solutions were just very slightly different from the optimal ones around C = 50%. The other algorithms displayed two different types of behaviors. On one hand, MCL, RB and LPA progressively separated from the optimal value toward the center of the benchmark. Notice, however, that this minimum should appear exactly when C = 50%, this not being the case for RB, which showed slightly asymmetric results (Figure 5). On the other hand, RN and CPM reached a fixed or almost fixed value that was maintained during a large part of the evolution of the network. This means that, during that period, these algorithms were either constantly obtaining the same solution or finding very similar solutions, regardless of the network analyzed. In fact, RN always allocated all nodes to different communities while CPM split the 5000 nodes into variable groups, all them with one to four units. Figure 6 displays the analogous analyses for the RC benchmarks. We confirmed that SCluster and RNSC were clearly the best-performing strategies. The other algorithms satisfied the condition of optimality only when the network analyzed was very similar to either the initial or the final structure. This detailed analyses also showed more clearly something that could be suspected already looking at Figure 3, namely that RN and CPM produced abnormal patterns. The quasi-constant value of RN around C = 50% is explained by the fact that all its solutions in the center of the benchmark consisted of two clusters, one of them containing more than 99% of the nodes. On the other hand, CPM displayed an unstable behavior. The results in Figures 1,2,3,4,5,6 indicate that the RC benchmarks are at least as difficult as the LFR benchmarks, even though the number of nodes is much smaller (512 versus 5000). The considerable density of links and the highly skewed distribution of community sizes in the RC networks explain this fact.


Exploring the limits of community detection strategies in complex networks.

Aldecoa R, Marín I - Sci Rep (2013)

Details of the performance of the best algorithms in LFR benchmarks.The y-axis (VIδ) corresponds to the difference between the expected value, VIIF and the VIIE + VIEF value of the different solutions. VIδ values close to zero correspond to the best performers.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3713530&req=5

f5: Details of the performance of the best algorithms in LFR benchmarks.The y-axis (VIδ) corresponds to the difference between the expected value, VIIF and the VIIE + VIEF value of the different solutions. VIδ values close to zero correspond to the best performers.
Mentions: Figures 5 and 6 show in more detail the deviations from the optimal values, indicated as VIδ = VIIF − (VIIE + VIEF), of the six best algorithms of each benchmark. This value is equal to 0 when agreement with the expected optimal performance is perfect. The larger the deviations from that optimal behavior, the more negative are the values of VIδ. In the LFR benchmarks (Fig. 5), we confirmed that Infomap outperformed the other five algorithms. Its solutions were just very slightly different from the optimal ones around C = 50%. The other algorithms displayed two different types of behaviors. On one hand, MCL, RB and LPA progressively separated from the optimal value toward the center of the benchmark. Notice, however, that this minimum should appear exactly when C = 50%, this not being the case for RB, which showed slightly asymmetric results (Figure 5). On the other hand, RN and CPM reached a fixed or almost fixed value that was maintained during a large part of the evolution of the network. This means that, during that period, these algorithms were either constantly obtaining the same solution or finding very similar solutions, regardless of the network analyzed. In fact, RN always allocated all nodes to different communities while CPM split the 5000 nodes into variable groups, all them with one to four units. Figure 6 displays the analogous analyses for the RC benchmarks. We confirmed that SCluster and RNSC were clearly the best-performing strategies. The other algorithms satisfied the condition of optimality only when the network analyzed was very similar to either the initial or the final structure. This detailed analyses also showed more clearly something that could be suspected already looking at Figure 3, namely that RN and CPM produced abnormal patterns. The quasi-constant value of RN around C = 50% is explained by the fact that all its solutions in the center of the benchmark consisted of two clusters, one of them containing more than 99% of the nodes. On the other hand, CPM displayed an unstable behavior. The results in Figures 1,2,3,4,5,6 indicate that the RC benchmarks are at least as difficult as the LFR benchmarks, even though the number of nodes is much smaller (512 versus 5000). The considerable density of links and the highly skewed distribution of community sizes in the RC networks explain this fact.

Bottom Line: The characterization of network community structure has profound implications in several scientific areas.Surprise, a global parameter that evaluates the quality of a partition, confirms the power of these analyses.We show that none of the community detection algorithms tested provide consistently optimal results in all networks and that Surprise maximization, obtained by combining multiple algorithms, obtains quasi-optimal performances in these difficult benchmarks.

View Article: PubMed Central - PubMed

Affiliation: Instituto de Biomedicina de Valencia, Consejo Superior de Investigaciones Científicas, Calle Jaime Roig 11, 46010, Valencia, Spain.

ABSTRACT
The characterization of network community structure has profound implications in several scientific areas. Therefore, testing the algorithms developed to establish the optimal division of a network into communities is a fundamental problem in the field. We performed here a highly detailed evaluation of community detection algorithms, which has two main novelties: 1) using complex closed benchmarks, which provide precise ways to assess whether the solutions generated by the algorithms are optimal; and, 2) A novel type of analysis, based on hierarchically clustering the solutions suggested by multiple community detection algorithms, which allows to easily visualize how different are those solutions. Surprise, a global parameter that evaluates the quality of a partition, confirms the power of these analyses. We show that none of the community detection algorithms tested provide consistently optimal results in all networks and that Surprise maximization, obtained by combining multiple algorithms, obtains quasi-optimal performances in these difficult benchmarks.

No MeSH data available.