Limits...
A DIseAse MOdule Detection (DIAMOnD) algorithm derived from a systematic analysis of connectivity patterns of disease proteins in the human interactome.

Ghiassian SD, Menche J, Barabási AL - PLoS Comput. Biol. (2015)

Bottom Line: The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease.While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored.We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity.

View Article: PubMed Central - PubMed

Affiliation: Center for Complex Networks Research and Department of Physics, Northeastern University, Boston, Massachusetts, United States of America; Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America.

ABSTRACT
The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease. Such approaches build on the assumption that protein interaction networks can be viewed as maps in which diseases can be identified with localized perturbation within a certain neighborhood. The identification of these neighborhoods, or disease modules, is therefore a prerequisite of a detailed investigation of a particular pathophenotype. While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored. In this work we aim to fill this gap by analyzing the network properties of a comprehensive corpus of 70 complex diseases. We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity. This quantity inspires the design of a novel Disease Module Detection (DIAMOnD) algorithm to identify the full disease module around a set of known disease proteins. We study the performance of the algorithm using well-controlled synthetic data and systematically validate the identified neighborhoods for a large corpus of diseases.

No MeSH data available.


Related in: MedlinePlus

Comparison between DIAMOnD and Random Walk (RW).(A,B) Average recovery rates of DIAMOnD and the reference RW algorithm when removing 50% (100 nodes) of 100 generated shells (A) and connectivity (B) modules. (C) Comparison of the biological evidence for proteins identified by DIAMOnD and RW for lysosomal storage diseases. (D) Overlap between identified proteins and immediate neighbors of seed proteins. In contrast to RW, DIAMOnD includes a considerable number of proteins without first-order interactions to seed genes. (E) Comparison of the performance of DIAMOnD and RW across 70 diseases with respect to non-specific disease data. (F) Degree distributions of the identified proteins. DIAMonD proteins are characterized by the absence of hubs.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4390154&req=5

pcbi.1004120.g005: Comparison between DIAMOnD and Random Walk (RW).(A,B) Average recovery rates of DIAMOnD and the reference RW algorithm when removing 50% (100 nodes) of 100 generated shells (A) and connectivity (B) modules. (C) Comparison of the biological evidence for proteins identified by DIAMOnD and RW for lysosomal storage diseases. (D) Overlap between identified proteins and immediate neighbors of seed proteins. In contrast to RW, DIAMOnD includes a considerable number of proteins without first-order interactions to seed genes. (E) Comparison of the performance of DIAMOnD and RW across 70 diseases with respect to non-specific disease data. (F) Degree distributions of the identified proteins. DIAMonD proteins are characterized by the absence of hubs.

Mentions: Fig. 5A,B summarizes the results of the comparison between DIAMOnD and RW on the synthetic modules. As we removed the attribute from half of the module nodes (about 100 nodes), iteration step 100 is a reasonable point of comparison. For both types of synthetic modules we find that DIAMOnD has a higher recovery in the top 100 predictions, whereas RW captures more true hits in its late predictions. In most cases DIAMOnD is able to identify removed nodes in the early iterations until the recovery rate saturates (Fig. 5A). A higher initial slope corresponds to higher precision, i.e. a higher ratio of true positives TP/(TP+FP). DIAMOnD shows higher precision and sensitivity (recall) in the initial iterations whereas RW performs better at later iterations once DIAMOnD saturated. In the context of disease protein identification, a high quality detection of fewer proteins with few false positives is generally more desirable than low quality detection of hundreds of proteins.


A DIseAse MOdule Detection (DIAMOnD) algorithm derived from a systematic analysis of connectivity patterns of disease proteins in the human interactome.

Ghiassian SD, Menche J, Barabási AL - PLoS Comput. Biol. (2015)

Comparison between DIAMOnD and Random Walk (RW).(A,B) Average recovery rates of DIAMOnD and the reference RW algorithm when removing 50% (100 nodes) of 100 generated shells (A) and connectivity (B) modules. (C) Comparison of the biological evidence for proteins identified by DIAMOnD and RW for lysosomal storage diseases. (D) Overlap between identified proteins and immediate neighbors of seed proteins. In contrast to RW, DIAMOnD includes a considerable number of proteins without first-order interactions to seed genes. (E) Comparison of the performance of DIAMOnD and RW across 70 diseases with respect to non-specific disease data. (F) Degree distributions of the identified proteins. DIAMonD proteins are characterized by the absence of hubs.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4390154&req=5

pcbi.1004120.g005: Comparison between DIAMOnD and Random Walk (RW).(A,B) Average recovery rates of DIAMOnD and the reference RW algorithm when removing 50% (100 nodes) of 100 generated shells (A) and connectivity (B) modules. (C) Comparison of the biological evidence for proteins identified by DIAMOnD and RW for lysosomal storage diseases. (D) Overlap between identified proteins and immediate neighbors of seed proteins. In contrast to RW, DIAMOnD includes a considerable number of proteins without first-order interactions to seed genes. (E) Comparison of the performance of DIAMOnD and RW across 70 diseases with respect to non-specific disease data. (F) Degree distributions of the identified proteins. DIAMonD proteins are characterized by the absence of hubs.
Mentions: Fig. 5A,B summarizes the results of the comparison between DIAMOnD and RW on the synthetic modules. As we removed the attribute from half of the module nodes (about 100 nodes), iteration step 100 is a reasonable point of comparison. For both types of synthetic modules we find that DIAMOnD has a higher recovery in the top 100 predictions, whereas RW captures more true hits in its late predictions. In most cases DIAMOnD is able to identify removed nodes in the early iterations until the recovery rate saturates (Fig. 5A). A higher initial slope corresponds to higher precision, i.e. a higher ratio of true positives TP/(TP+FP). DIAMOnD shows higher precision and sensitivity (recall) in the initial iterations whereas RW performs better at later iterations once DIAMOnD saturated. In the context of disease protein identification, a high quality detection of fewer proteins with few false positives is generally more desirable than low quality detection of hundreds of proteins.

Bottom Line: The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease.While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored.We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity.

View Article: PubMed Central - PubMed

Affiliation: Center for Complex Networks Research and Department of Physics, Northeastern University, Boston, Massachusetts, United States of America; Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America.

ABSTRACT
The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease. Such approaches build on the assumption that protein interaction networks can be viewed as maps in which diseases can be identified with localized perturbation within a certain neighborhood. The identification of these neighborhoods, or disease modules, is therefore a prerequisite of a detailed investigation of a particular pathophenotype. While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored. In this work we aim to fill this gap by analyzing the network properties of a comprehensive corpus of 70 complex diseases. We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity. This quantity inspires the design of a novel Disease Module Detection (DIAMOnD) algorithm to identify the full disease module around a set of known disease proteins. We study the performance of the algorithm using well-controlled synthetic data and systematically validate the identified neighborhoods for a large corpus of diseases.

No MeSH data available.


Related in: MedlinePlus