Limits...
A DIseAse MOdule Detection (DIAMOnD) algorithm derived from a systematic analysis of connectivity patterns of disease proteins in the human interactome.

Ghiassian SD, Menche J, Barabási AL - PLoS Comput. Biol. (2015)

Bottom Line: The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease.While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored.We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity.

View Article: PubMed Central - PubMed

Affiliation: Center for Complex Networks Research and Department of Physics, Northeastern University, Boston, Massachusetts, United States of America; Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America.

ABSTRACT
The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease. Such approaches build on the assumption that protein interaction networks can be viewed as maps in which diseases can be identified with localized perturbation within a certain neighborhood. The identification of these neighborhoods, or disease modules, is therefore a prerequisite of a detailed investigation of a particular pathophenotype. While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored. In this work we aim to fill this gap by analyzing the network properties of a comprehensive corpus of 70 complex diseases. We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity. This quantity inspires the design of a novel Disease Module Detection (DIAMOnD) algorithm to identify the full disease module around a set of known disease proteins. We study the performance of the algorithm using well-controlled synthetic data and systematically validate the identified neighborhoods for a large corpus of diseases.

No MeSH data available.


Related in: MedlinePlus

Topological properties of disease proteins within the Interactome.(A) Proteins associated with the same phenotype tend to localize in specific neighborhoods of the Interactome, indicating the approximate location of the corresponding disease modules. Topological network communities are highly interconnected groups of nodes. (B) Distribution of the fraction of disease proteins within the largest connected component (LCC) for 70 diseases. Only 10%-30% of the disease proteins are part of the LCC. (C) LCC size of proteins associated with lysosomal storage disease compared to random expectation. Out of 45 disease proteins, 24 (53%) are part of the LCC (z-score = 23.42, empirical p-value < 10–6). (D) Significance of the LCC sizes as measured by the z-score for all 70 considered diseases. The whiskers indicate the minimum, 25th, 50th, 75th percentile and maximum across all diseases. Overall, 70% of the diseases show significant clustering (z-score>1.6). (E) LCC z-score distribution in noisy networks in which a fraction f of all links is randomized by either link removal or rewiring. (F) We applied three representative community detection algorithms to explore the extent to which topological modules correspond to disease modules. Only 1%-5% of the communities detected by the different methods are significantly enriched with disease proteins, none of which includes a significant fraction of all disease proteins. (G) Comparison of the distribution of the local modularity R for disease proteins and proteins randomly selected from the Interactome. (H) Distribution of the connectivity significance of disease proteins and randomly selected proteins. (I) Connectivity significance of disease proteins as a function of the fraction f of links removed from the network. The red bars denote the mean and the standard deviation as measured across 70 diseases, yellow bars show random expectation obtained from the same number of randomly distributed genes. (J) Local modularity of disease proteins and randomly selected proteins when a fraction f of the links is removed from the network. (K) Illustration of the local modularity R.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4390154&req=5

pcbi.1004120.g001: Topological properties of disease proteins within the Interactome.(A) Proteins associated with the same phenotype tend to localize in specific neighborhoods of the Interactome, indicating the approximate location of the corresponding disease modules. Topological network communities are highly interconnected groups of nodes. (B) Distribution of the fraction of disease proteins within the largest connected component (LCC) for 70 diseases. Only 10%-30% of the disease proteins are part of the LCC. (C) LCC size of proteins associated with lysosomal storage disease compared to random expectation. Out of 45 disease proteins, 24 (53%) are part of the LCC (z-score = 23.42, empirical p-value < 10–6). (D) Significance of the LCC sizes as measured by the z-score for all 70 considered diseases. The whiskers indicate the minimum, 25th, 50th, 75th percentile and maximum across all diseases. Overall, 70% of the diseases show significant clustering (z-score>1.6). (E) LCC z-score distribution in noisy networks in which a fraction f of all links is randomized by either link removal or rewiring. (F) We applied three representative community detection algorithms to explore the extent to which topological modules correspond to disease modules. Only 1%-5% of the communities detected by the different methods are significantly enriched with disease proteins, none of which includes a significant fraction of all disease proteins. (G) Comparison of the distribution of the local modularity R for disease proteins and proteins randomly selected from the Interactome. (H) Distribution of the connectivity significance of disease proteins and randomly selected proteins. (I) Connectivity significance of disease proteins as a function of the fraction f of links removed from the network. The red bars denote the mean and the standard deviation as measured across 70 diseases, yellow bars show random expectation obtained from the same number of randomly distributed genes. (J) Local modularity of disease proteins and randomly selected proteins when a fraction f of the links is removed from the network. (K) Illustration of the local modularity R.

Mentions: In the recent years, there is increasing evidence that proteins associated with a particular disease have distinct interactions within the Human Interactome, representing the cellular network of all physical molecular interactions [1–7]. The pathobiological properties of a disease and its clinical manifestations can be linked to perturbations within these disease neighborhoods, or disease modules [8]. With recent advances in genome-wide disease gene association [9] and high-throughput Interactome mapping [10] we can already pinpoint the approximate location for some disease modules (Fig. 1A). For many diseases, however, a considerable fraction of their disease associations remain unknown [11]. In this paper, we propose a network-based methodology to uncover the disease module associated with a particular phenotype. The algorithm is based on a systematic analysis of the network properties of known disease proteins across 70 diseases, revealing that instead of connection density the connectivity significance is the most predictive quantity characterizing their interaction patterns. This quantity allows us to systematically explore the local network neighborhood around a given set of known disease proteins, helping us identifying promising new disease protein candidates.


A DIseAse MOdule Detection (DIAMOnD) algorithm derived from a systematic analysis of connectivity patterns of disease proteins in the human interactome.

Ghiassian SD, Menche J, Barabási AL - PLoS Comput. Biol. (2015)

Topological properties of disease proteins within the Interactome.(A) Proteins associated with the same phenotype tend to localize in specific neighborhoods of the Interactome, indicating the approximate location of the corresponding disease modules. Topological network communities are highly interconnected groups of nodes. (B) Distribution of the fraction of disease proteins within the largest connected component (LCC) for 70 diseases. Only 10%-30% of the disease proteins are part of the LCC. (C) LCC size of proteins associated with lysosomal storage disease compared to random expectation. Out of 45 disease proteins, 24 (53%) are part of the LCC (z-score = 23.42, empirical p-value < 10–6). (D) Significance of the LCC sizes as measured by the z-score for all 70 considered diseases. The whiskers indicate the minimum, 25th, 50th, 75th percentile and maximum across all diseases. Overall, 70% of the diseases show significant clustering (z-score>1.6). (E) LCC z-score distribution in noisy networks in which a fraction f of all links is randomized by either link removal or rewiring. (F) We applied three representative community detection algorithms to explore the extent to which topological modules correspond to disease modules. Only 1%-5% of the communities detected by the different methods are significantly enriched with disease proteins, none of which includes a significant fraction of all disease proteins. (G) Comparison of the distribution of the local modularity R for disease proteins and proteins randomly selected from the Interactome. (H) Distribution of the connectivity significance of disease proteins and randomly selected proteins. (I) Connectivity significance of disease proteins as a function of the fraction f of links removed from the network. The red bars denote the mean and the standard deviation as measured across 70 diseases, yellow bars show random expectation obtained from the same number of randomly distributed genes. (J) Local modularity of disease proteins and randomly selected proteins when a fraction f of the links is removed from the network. (K) Illustration of the local modularity R.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4390154&req=5

pcbi.1004120.g001: Topological properties of disease proteins within the Interactome.(A) Proteins associated with the same phenotype tend to localize in specific neighborhoods of the Interactome, indicating the approximate location of the corresponding disease modules. Topological network communities are highly interconnected groups of nodes. (B) Distribution of the fraction of disease proteins within the largest connected component (LCC) for 70 diseases. Only 10%-30% of the disease proteins are part of the LCC. (C) LCC size of proteins associated with lysosomal storage disease compared to random expectation. Out of 45 disease proteins, 24 (53%) are part of the LCC (z-score = 23.42, empirical p-value < 10–6). (D) Significance of the LCC sizes as measured by the z-score for all 70 considered diseases. The whiskers indicate the minimum, 25th, 50th, 75th percentile and maximum across all diseases. Overall, 70% of the diseases show significant clustering (z-score>1.6). (E) LCC z-score distribution in noisy networks in which a fraction f of all links is randomized by either link removal or rewiring. (F) We applied three representative community detection algorithms to explore the extent to which topological modules correspond to disease modules. Only 1%-5% of the communities detected by the different methods are significantly enriched with disease proteins, none of which includes a significant fraction of all disease proteins. (G) Comparison of the distribution of the local modularity R for disease proteins and proteins randomly selected from the Interactome. (H) Distribution of the connectivity significance of disease proteins and randomly selected proteins. (I) Connectivity significance of disease proteins as a function of the fraction f of links removed from the network. The red bars denote the mean and the standard deviation as measured across 70 diseases, yellow bars show random expectation obtained from the same number of randomly distributed genes. (J) Local modularity of disease proteins and randomly selected proteins when a fraction f of the links is removed from the network. (K) Illustration of the local modularity R.
Mentions: In the recent years, there is increasing evidence that proteins associated with a particular disease have distinct interactions within the Human Interactome, representing the cellular network of all physical molecular interactions [1–7]. The pathobiological properties of a disease and its clinical manifestations can be linked to perturbations within these disease neighborhoods, or disease modules [8]. With recent advances in genome-wide disease gene association [9] and high-throughput Interactome mapping [10] we can already pinpoint the approximate location for some disease modules (Fig. 1A). For many diseases, however, a considerable fraction of their disease associations remain unknown [11]. In this paper, we propose a network-based methodology to uncover the disease module associated with a particular phenotype. The algorithm is based on a systematic analysis of the network properties of known disease proteins across 70 diseases, revealing that instead of connection density the connectivity significance is the most predictive quantity characterizing their interaction patterns. This quantity allows us to systematically explore the local network neighborhood around a given set of known disease proteins, helping us identifying promising new disease protein candidates.

Bottom Line: The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease.While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored.We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity.

View Article: PubMed Central - PubMed

Affiliation: Center for Complex Networks Research and Department of Physics, Northeastern University, Boston, Massachusetts, United States of America; Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America.

ABSTRACT
The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease. Such approaches build on the assumption that protein interaction networks can be viewed as maps in which diseases can be identified with localized perturbation within a certain neighborhood. The identification of these neighborhoods, or disease modules, is therefore a prerequisite of a detailed investigation of a particular pathophenotype. While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored. In this work we aim to fill this gap by analyzing the network properties of a comprehensive corpus of 70 complex diseases. We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity. This quantity inspires the design of a novel Disease Module Detection (DIAMOnD) algorithm to identify the full disease module around a set of known disease proteins. We study the performance of the algorithm using well-controlled synthetic data and systematically validate the identified neighborhoods for a large corpus of diseases.

No MeSH data available.


Related in: MedlinePlus