Limits...
A DIseAse MOdule Detection (DIAMOnD) algorithm derived from a systematic analysis of connectivity patterns of disease proteins in the human interactome.

Ghiassian SD, Menche J, Barabási AL - PLoS Comput. Biol. (2015)

Bottom Line: The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease.While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored.We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity.

View Article: PubMed Central - PubMed

Affiliation: Center for Complex Networks Research and Department of Physics, Northeastern University, Boston, Massachusetts, United States of America; Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America.

ABSTRACT
The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease. Such approaches build on the assumption that protein interaction networks can be viewed as maps in which diseases can be identified with localized perturbation within a certain neighborhood. The identification of these neighborhoods, or disease modules, is therefore a prerequisite of a detailed investigation of a particular pathophenotype. While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored. In this work we aim to fill this gap by analyzing the network properties of a comprehensive corpus of 70 complex diseases. We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity. This quantity inspires the design of a novel Disease Module Detection (DIAMOnD) algorithm to identify the full disease module around a set of known disease proteins. We study the performance of the algorithm using well-controlled synthetic data and systematically validate the identified neighborhoods for a large corpus of diseases.

No MeSH data available.


Related in: MedlinePlus

Performance evaluation of DIAMOnD.We use two different methods to construct synthetic modules (shells and connectivity modules). (A, B) Recovery rate of the DIAMOnD algorithm when removing 50% of seed nodes from shells (A) and connectivity synthetic modules (B), respectively. The recovery rate in synthetic modules is roughly independent of the module incompleteness. (C, D) Recovery rate when 25%, 50% and 75% of the nodes are removed from shells and connectivity modules. (E, F) Recovery rate when 10%, 20% and 30% of the nodes are removed from the disease proteins of lysosomal storage diseases and lipid metabolism disorders. (G) Robustness of the DIAMOnD algorithm towards small variations in the starting seed proteins (N-1 analysis). While most nodes influence the outcome very little, there are a few nodes whose removal results in a large deviation from the original outcome. This deviation may either persist across iterations (red data points) or disappear after a few iterations (green). (H) Crucial nodes are characterized by a 3–4 times higher degree. (I) DIAMOnD robustness towards random link removal from the Interactome. We identified the DIAMOnD proteins for 70 diseases in the original Interactome as well as in perturbed networks with varying fractions f of randomly removed links. Data points and bars represent the median and median absolute deviation of the overlap (number of common proteins) between original and randomized DIAMOnD sets across 70 diseases as a function of the iteration step. (J) Same as (I), but for perturbed networks in which varying fractions f of all links have been randomly rewired.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4390154&req=5

pcbi.1004120.g003: Performance evaluation of DIAMOnD.We use two different methods to construct synthetic modules (shells and connectivity modules). (A, B) Recovery rate of the DIAMOnD algorithm when removing 50% of seed nodes from shells (A) and connectivity synthetic modules (B), respectively. The recovery rate in synthetic modules is roughly independent of the module incompleteness. (C, D) Recovery rate when 25%, 50% and 75% of the nodes are removed from shells and connectivity modules. (E, F) Recovery rate when 10%, 20% and 30% of the nodes are removed from the disease proteins of lysosomal storage diseases and lipid metabolism disorders. (G) Robustness of the DIAMOnD algorithm towards small variations in the starting seed proteins (N-1 analysis). While most nodes influence the outcome very little, there are a few nodes whose removal results in a large deviation from the original outcome. This deviation may either persist across iterations (red data points) or disappear after a few iterations (green). (H) Crucial nodes are characterized by a 3–4 times higher degree. (I) DIAMOnD robustness towards random link removal from the Interactome. We identified the DIAMOnD proteins for 70 diseases in the original Interactome as well as in perturbed networks with varying fractions f of randomly removed links. Data points and bars represent the median and median absolute deviation of the overlap (number of common proteins) between original and randomized DIAMOnD sets across 70 diseases as a function of the iteration step. (J) Same as (I), but for perturbed networks in which varying fractions f of all links have been randomly rewired.

Mentions: For each initially connected synthetic module, we randomly removed a certain fraction (25%, 50% and 75%) of the nodes and use the remaining nodes as seed proteins for DIAMOnD. Fig. 3A and 3B show the fraction of recaptured initial seed nodes (recall) as a function of the number of iterations of the algorithm for 50% of the module removed. As expected, the highest rate of true positives is achieved in early iterations, so the highest ranked proteins are most likely to be part of the original full module.


A DIseAse MOdule Detection (DIAMOnD) algorithm derived from a systematic analysis of connectivity patterns of disease proteins in the human interactome.

Ghiassian SD, Menche J, Barabási AL - PLoS Comput. Biol. (2015)

Performance evaluation of DIAMOnD.We use two different methods to construct synthetic modules (shells and connectivity modules). (A, B) Recovery rate of the DIAMOnD algorithm when removing 50% of seed nodes from shells (A) and connectivity synthetic modules (B), respectively. The recovery rate in synthetic modules is roughly independent of the module incompleteness. (C, D) Recovery rate when 25%, 50% and 75% of the nodes are removed from shells and connectivity modules. (E, F) Recovery rate when 10%, 20% and 30% of the nodes are removed from the disease proteins of lysosomal storage diseases and lipid metabolism disorders. (G) Robustness of the DIAMOnD algorithm towards small variations in the starting seed proteins (N-1 analysis). While most nodes influence the outcome very little, there are a few nodes whose removal results in a large deviation from the original outcome. This deviation may either persist across iterations (red data points) or disappear after a few iterations (green). (H) Crucial nodes are characterized by a 3–4 times higher degree. (I) DIAMOnD robustness towards random link removal from the Interactome. We identified the DIAMOnD proteins for 70 diseases in the original Interactome as well as in perturbed networks with varying fractions f of randomly removed links. Data points and bars represent the median and median absolute deviation of the overlap (number of common proteins) between original and randomized DIAMOnD sets across 70 diseases as a function of the iteration step. (J) Same as (I), but for perturbed networks in which varying fractions f of all links have been randomly rewired.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4390154&req=5

pcbi.1004120.g003: Performance evaluation of DIAMOnD.We use two different methods to construct synthetic modules (shells and connectivity modules). (A, B) Recovery rate of the DIAMOnD algorithm when removing 50% of seed nodes from shells (A) and connectivity synthetic modules (B), respectively. The recovery rate in synthetic modules is roughly independent of the module incompleteness. (C, D) Recovery rate when 25%, 50% and 75% of the nodes are removed from shells and connectivity modules. (E, F) Recovery rate when 10%, 20% and 30% of the nodes are removed from the disease proteins of lysosomal storage diseases and lipid metabolism disorders. (G) Robustness of the DIAMOnD algorithm towards small variations in the starting seed proteins (N-1 analysis). While most nodes influence the outcome very little, there are a few nodes whose removal results in a large deviation from the original outcome. This deviation may either persist across iterations (red data points) or disappear after a few iterations (green). (H) Crucial nodes are characterized by a 3–4 times higher degree. (I) DIAMOnD robustness towards random link removal from the Interactome. We identified the DIAMOnD proteins for 70 diseases in the original Interactome as well as in perturbed networks with varying fractions f of randomly removed links. Data points and bars represent the median and median absolute deviation of the overlap (number of common proteins) between original and randomized DIAMOnD sets across 70 diseases as a function of the iteration step. (J) Same as (I), but for perturbed networks in which varying fractions f of all links have been randomly rewired.
Mentions: For each initially connected synthetic module, we randomly removed a certain fraction (25%, 50% and 75%) of the nodes and use the remaining nodes as seed proteins for DIAMOnD. Fig. 3A and 3B show the fraction of recaptured initial seed nodes (recall) as a function of the number of iterations of the algorithm for 50% of the module removed. As expected, the highest rate of true positives is achieved in early iterations, so the highest ranked proteins are most likely to be part of the original full module.

Bottom Line: The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease.While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored.We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity.

View Article: PubMed Central - PubMed

Affiliation: Center for Complex Networks Research and Department of Physics, Northeastern University, Boston, Massachusetts, United States of America; Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America.

ABSTRACT
The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease. Such approaches build on the assumption that protein interaction networks can be viewed as maps in which diseases can be identified with localized perturbation within a certain neighborhood. The identification of these neighborhoods, or disease modules, is therefore a prerequisite of a detailed investigation of a particular pathophenotype. While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored. In this work we aim to fill this gap by analyzing the network properties of a comprehensive corpus of 70 complex diseases. We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity. This quantity inspires the design of a novel Disease Module Detection (DIAMOnD) algorithm to identify the full disease module around a set of known disease proteins. We study the performance of the algorithm using well-controlled synthetic data and systematically validate the identified neighborhoods for a large corpus of diseases.

No MeSH data available.


Related in: MedlinePlus