Limits...
Inferring meaningful communities from topology-constrained correlation networks.

Hleap JS, Blouin C - PLoS ONE (2014)

Bottom Line: The [Formula: see text]-amylase catalytic domain dataset (biological dataset) was analyzed with and without topological constraints (inter-residue contacts).The results without topological constraints showed differences with the topology constrained one, but the LDA filtering did not change the outcome of the latter.This suggests that the LDA filtering is a robust way to solve the possible over-fragmentation when present, and that this method will not affect the results where there is no evidence of over-fragmentation.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry and Molecular Biology, Dalhouise University, Halifax, Nova Scotia, Canada.

ABSTRACT
Community structure detection is an important tool in graph analysis. This can be done, among other ways, by solving for the partition set which optimizes the modularity scores [Formula: see text]. Here it is shown that topological constraints in correlation graphs induce over-fragmentation of community structures. A refinement step to this optimization based on Linear Discriminant Analysis (LDA) and a statistical test for significance is proposed. In structured simulation constrained by topology, this novel approach performs better than the optimization of modularity alone. This method was also tested with two empirical datasets: the Roll-Call voting in the 110th US Senate constrained by geographic adjacency, and a biological dataset of 135 protein structures constrained by inter-residue contacts. The former dataset showed sub-structures in the communities that revealed a regional bias in the votes which transcend party affiliations. This is an interesting pattern given that the 110th Legislature was assumed to be a highly polarized government. The [Formula: see text]-amylase catalytic domain dataset (biological dataset) was analyzed with and without topological constraints (inter-residue contacts). The results without topological constraints showed differences with the topology constrained one, but the LDA filtering did not change the outcome of the latter. This suggests that the LDA filtering is a robust way to solve the possible over-fragmentation when present, and that this method will not affect the results where there is no evidence of over-fragmentation.

Show MeSH
-amylase homologs.Clusters (modules) found in an extension of the modularity inference performed in [1], inclusing 135 homologs of the catalytic domain of the -amylase. a) Modules inferred without constraining the topology with inter-residue contacts. b) Modules inferred constraining the topology in A with inter-residue contacts. c) Modules inferred by prefiltering the results in B, before significance testing.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4237410&req=5

pone-0113438-g004: -amylase homologs.Clusters (modules) found in an extension of the modularity inference performed in [1], inclusing 135 homologs of the catalytic domain of the -amylase. a) Modules inferred without constraining the topology with inter-residue contacts. b) Modules inferred constraining the topology in A with inter-residue contacts. c) Modules inferred by prefiltering the results in B, before significance testing.

Mentions: In Hleap et al. [1], a dataset of 85 protein structures was analyzed to find a sub-domain architecture. They found four significant clusters, one of which comprises the minimum functional TIM-barrel [1]. In this manuscript that search has been broaden gathering 135 structures. To show a biological application of the LDA prefiltering, the algorithm described in [1] without contacts restrains was performed, with inter-residue contacts constraint, and the latter with LDA pre-filtering. Figure 4 shows the results for this case, where each color represents a cluster of residues within the protein. In the absence of contact restrains (Figure 4a) bigger clusters are found. Some clusters are made of disconnected components (orange cluster). There are significant smaller clusters than in the other cases (Figures 4b and 4c), and the biological meaning for the lack of contiguity is obscure. It can be ascribed that disjoint components in a cluster reflect a higher level community, which is not interesting from a protein modularity perspective. Figure 4b, shows the result for the same algorithm, when considering topology constraint based on the inter-residue contacts. Here, more sensible results are gathered returning the minimal functional TIM barrel topology obtained in [1] (yellow cluster). Figure 4c corresponds to the same topology-constrained network in Figure 4b, but with LDA pre-filtering, however the result is identical. This suggests that the LDA-filtered community structure at the protein level is strong and significant enough to avoid merging. This observation makes sense since Hleap et al. [1] were testing for correlation among residues and this information can be correlated with the contact between them. It is also important to state that when no over-fragmentation occurs (like in this particular dataset) LDA will not affect the result.


Inferring meaningful communities from topology-constrained correlation networks.

Hleap JS, Blouin C - PLoS ONE (2014)

-amylase homologs.Clusters (modules) found in an extension of the modularity inference performed in [1], inclusing 135 homologs of the catalytic domain of the -amylase. a) Modules inferred without constraining the topology with inter-residue contacts. b) Modules inferred constraining the topology in A with inter-residue contacts. c) Modules inferred by prefiltering the results in B, before significance testing.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4237410&req=5

pone-0113438-g004: -amylase homologs.Clusters (modules) found in an extension of the modularity inference performed in [1], inclusing 135 homologs of the catalytic domain of the -amylase. a) Modules inferred without constraining the topology with inter-residue contacts. b) Modules inferred constraining the topology in A with inter-residue contacts. c) Modules inferred by prefiltering the results in B, before significance testing.
Mentions: In Hleap et al. [1], a dataset of 85 protein structures was analyzed to find a sub-domain architecture. They found four significant clusters, one of which comprises the minimum functional TIM-barrel [1]. In this manuscript that search has been broaden gathering 135 structures. To show a biological application of the LDA prefiltering, the algorithm described in [1] without contacts restrains was performed, with inter-residue contacts constraint, and the latter with LDA pre-filtering. Figure 4 shows the results for this case, where each color represents a cluster of residues within the protein. In the absence of contact restrains (Figure 4a) bigger clusters are found. Some clusters are made of disconnected components (orange cluster). There are significant smaller clusters than in the other cases (Figures 4b and 4c), and the biological meaning for the lack of contiguity is obscure. It can be ascribed that disjoint components in a cluster reflect a higher level community, which is not interesting from a protein modularity perspective. Figure 4b, shows the result for the same algorithm, when considering topology constraint based on the inter-residue contacts. Here, more sensible results are gathered returning the minimal functional TIM barrel topology obtained in [1] (yellow cluster). Figure 4c corresponds to the same topology-constrained network in Figure 4b, but with LDA pre-filtering, however the result is identical. This suggests that the LDA-filtered community structure at the protein level is strong and significant enough to avoid merging. This observation makes sense since Hleap et al. [1] were testing for correlation among residues and this information can be correlated with the contact between them. It is also important to state that when no over-fragmentation occurs (like in this particular dataset) LDA will not affect the result.

Bottom Line: The [Formula: see text]-amylase catalytic domain dataset (biological dataset) was analyzed with and without topological constraints (inter-residue contacts).The results without topological constraints showed differences with the topology constrained one, but the LDA filtering did not change the outcome of the latter.This suggests that the LDA filtering is a robust way to solve the possible over-fragmentation when present, and that this method will not affect the results where there is no evidence of over-fragmentation.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry and Molecular Biology, Dalhouise University, Halifax, Nova Scotia, Canada.

ABSTRACT
Community structure detection is an important tool in graph analysis. This can be done, among other ways, by solving for the partition set which optimizes the modularity scores [Formula: see text]. Here it is shown that topological constraints in correlation graphs induce over-fragmentation of community structures. A refinement step to this optimization based on Linear Discriminant Analysis (LDA) and a statistical test for significance is proposed. In structured simulation constrained by topology, this novel approach performs better than the optimization of modularity alone. This method was also tested with two empirical datasets: the Roll-Call voting in the 110th US Senate constrained by geographic adjacency, and a biological dataset of 135 protein structures constrained by inter-residue contacts. The former dataset showed sub-structures in the communities that revealed a regional bias in the votes which transcend party affiliations. This is an interesting pattern given that the 110th Legislature was assumed to be a highly polarized government. The [Formula: see text]-amylase catalytic domain dataset (biological dataset) was analyzed with and without topological constraints (inter-residue contacts). The results without topological constraints showed differences with the topology constrained one, but the LDA filtering did not change the outcome of the latter. This suggests that the LDA filtering is a robust way to solve the possible over-fragmentation when present, and that this method will not affect the results where there is no evidence of over-fragmentation.

Show MeSH