Limits...
SCOUT: simultaneous time segmentation and community detection in dynamic networks

View Article: PubMed Central - PubMed

ABSTRACT

Many evolving complex real-world systems can be modeled via dynamic networks. An important problem in dynamic network research is community detection, which finds groups of topologically related nodes. Typically, this problem is approached by assuming either that each time point has a distinct community organization or that all time points share a single community organization. The reality likely lies between these two extremes. To find the compromise, we consider community detection in the context of the problem of segment detection, which identifies contiguous time periods with consistent network structure. Consequently, we formulate a combined problem of segment community detection (SCD), which simultaneously partitions the network into contiguous time segments with consistent community organization and finds this community organization for each segment. To solve SCD, we introduce SCOUT, an optimization framework that explicitly considers both segmentation quality and partition quality. SCOUT addresses limitations of existing methods that can be adapted to solve SCD, which consider only one of segmentation quality or partition quality. In a thorough evaluation, SCOUT outperforms the existing methods in terms of both accuracy and computational complexity. We apply SCOUT to biological network data to study human aging.

No MeSH data available.


Method comparison for synthetic networks with 100 nodes per snapshot with respect to (a) SimT and SimP (for the configuration with four ground truth segments), (b) change point classification, (c) QP, and (d) SimB. For a given synthetic network configuration, the results are averaged overall all of the corresponding random network instances. In panel (b), we do not consider configurations with the minimum and maximum possible numbers of ground truth segments, because for these configurations, either there are no change points at all (for one segment) or every time point is a change point (for 16 segments), and thus change point classification cannot be performed. In panel (c), the dotted lines correspond to the ground truth scores. Equivalent results for the remaining configurations are shown in Supplementary Figures S14 and S16.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5121586&req=5

f2: Method comparison for synthetic networks with 100 nodes per snapshot with respect to (a) SimT and SimP (for the configuration with four ground truth segments), (b) change point classification, (c) QP, and (d) SimB. For a given synthetic network configuration, the results are averaged overall all of the corresponding random network instances. In panel (b), we do not consider configurations with the minimum and maximum possible numbers of ground truth segments, because for these configurations, either there are no change points at all (for one segment) or every time point is a change point (for 16 segments), and thus change point classification cannot be performed. In panel (c), the dotted lines correspond to the ground truth scores. Equivalent results for the remaining configurations are shown in Supplementary Figures S14 and S16.

Mentions: For SimT, SCOUT is superior to all other methods, as it achieves the highest scores for 90% of all synthetic network configurations, while the other methods are comparable to each other (Supplementary S13). The remaining 10% (i.e., two) of all configurations in which an existing method (in this case, GraphScope) achieves higher scores are configurations with the two largest numbers of nodes per snapshot and with the maximum possible number of segments (Supplementary S14). The fact that GraphScope has higher SimT for these configurations is not surprising. Namely, GraphScope produces solutions with more segments than the other methods do, often overestimating the ground truth number of segments (Supplementary S15). So, for the configurations with the maximum possible number of segments, the most that GraphScope can overestimate is the maximum number of segments itself, i.e., the correct solution. Note that when measuring SimT for the extreme configurations with the minimum and maximum possible numbers of segments, we exclude Multi-Step from comparison. This is because of Multi-Step’s unfair advantage (see above), which for these extreme configurations means a priori knowing the correct segmentation and thus achieving the perfect SimT (Supplementary S14). For the remaining non-extreme configurations, Multi-Step is always outperformed by SCOUT and at least one of the existing methods (Supplementary S13). Thus, Multi-Step, which knows the ground truth number of segments a priori typically does not yield a high quality segmentation with respect to SimT, whereas SCOUT does so (and typically better than the other methods) despite not having this prior knowledge (Fig. 2(a) and Supplementary S14). This is further confirmed by SCOUT being able to automatically determine the ground truth number of segments more accurately than the existing methods (Supplementary S15).


SCOUT: simultaneous time segmentation and community detection in dynamic networks
Method comparison for synthetic networks with 100 nodes per snapshot with respect to (a) SimT and SimP (for the configuration with four ground truth segments), (b) change point classification, (c) QP, and (d) SimB. For a given synthetic network configuration, the results are averaged overall all of the corresponding random network instances. In panel (b), we do not consider configurations with the minimum and maximum possible numbers of ground truth segments, because for these configurations, either there are no change points at all (for one segment) or every time point is a change point (for 16 segments), and thus change point classification cannot be performed. In panel (c), the dotted lines correspond to the ground truth scores. Equivalent results for the remaining configurations are shown in Supplementary Figures S14 and S16.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5121586&req=5

f2: Method comparison for synthetic networks with 100 nodes per snapshot with respect to (a) SimT and SimP (for the configuration with four ground truth segments), (b) change point classification, (c) QP, and (d) SimB. For a given synthetic network configuration, the results are averaged overall all of the corresponding random network instances. In panel (b), we do not consider configurations with the minimum and maximum possible numbers of ground truth segments, because for these configurations, either there are no change points at all (for one segment) or every time point is a change point (for 16 segments), and thus change point classification cannot be performed. In panel (c), the dotted lines correspond to the ground truth scores. Equivalent results for the remaining configurations are shown in Supplementary Figures S14 and S16.
Mentions: For SimT, SCOUT is superior to all other methods, as it achieves the highest scores for 90% of all synthetic network configurations, while the other methods are comparable to each other (Supplementary S13). The remaining 10% (i.e., two) of all configurations in which an existing method (in this case, GraphScope) achieves higher scores are configurations with the two largest numbers of nodes per snapshot and with the maximum possible number of segments (Supplementary S14). The fact that GraphScope has higher SimT for these configurations is not surprising. Namely, GraphScope produces solutions with more segments than the other methods do, often overestimating the ground truth number of segments (Supplementary S15). So, for the configurations with the maximum possible number of segments, the most that GraphScope can overestimate is the maximum number of segments itself, i.e., the correct solution. Note that when measuring SimT for the extreme configurations with the minimum and maximum possible numbers of segments, we exclude Multi-Step from comparison. This is because of Multi-Step’s unfair advantage (see above), which for these extreme configurations means a priori knowing the correct segmentation and thus achieving the perfect SimT (Supplementary S14). For the remaining non-extreme configurations, Multi-Step is always outperformed by SCOUT and at least one of the existing methods (Supplementary S13). Thus, Multi-Step, which knows the ground truth number of segments a priori typically does not yield a high quality segmentation with respect to SimT, whereas SCOUT does so (and typically better than the other methods) despite not having this prior knowledge (Fig. 2(a) and Supplementary S14). This is further confirmed by SCOUT being able to automatically determine the ground truth number of segments more accurately than the existing methods (Supplementary S15).

View Article: PubMed Central - PubMed

ABSTRACT

Many evolving complex real-world systems can be modeled via dynamic networks. An important problem in dynamic network research is community detection, which finds groups of topologically related nodes. Typically, this problem is approached by assuming either that each time point has a distinct community organization or that all time points share a single community organization. The reality likely lies between these two extremes. To find the compromise, we consider community detection in the context of the problem of segment detection, which identifies contiguous time periods with consistent network structure. Consequently, we formulate a combined problem of segment community detection (SCD), which simultaneously partitions the network into contiguous time segments with consistent community organization and finds this community organization for each segment. To solve SCD, we introduce SCOUT, an optimization framework that explicitly considers both segmentation quality and partition quality. SCOUT addresses limitations of existing methods that can be adapted to solve SCD, which consider only one of segmentation quality or partition quality. In a thorough evaluation, SCOUT outperforms the existing methods in terms of both accuracy and computational complexity. We apply SCOUT to biological network data to study human aging.

No MeSH data available.