Limits...
A DIseAse MOdule Detection (DIAMOnD) algorithm derived from a systematic analysis of connectivity patterns of disease proteins in the human interactome.

Ghiassian SD, Menche J, Barabási AL - PLoS Comput. Biol. (2015)

Bottom Line: The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease.While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored.We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity.

View Article: PubMed Central - PubMed

Affiliation: Center for Complex Networks Research and Department of Physics, Northeastern University, Boston, Massachusetts, United States of America; Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America.

ABSTRACT
The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease. Such approaches build on the assumption that protein interaction networks can be viewed as maps in which diseases can be identified with localized perturbation within a certain neighborhood. The identification of these neighborhoods, or disease modules, is therefore a prerequisite of a detailed investigation of a particular pathophenotype. While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored. In this work we aim to fill this gap by analyzing the network properties of a comprehensive corpus of 70 complex diseases. We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity. This quantity inspires the design of a novel Disease Module Detection (DIAMOnD) algorithm to identify the full disease module around a set of known disease proteins. We study the performance of the algorithm using well-controlled synthetic data and systematically validate the identified neighborhoods for a large corpus of diseases.

No MeSH data available.


Related in: MedlinePlus

Biological evaluation of DIAMOnD.(A) Validation of the DIAMOnD genes based on GeneOntology terms (see Materials & Methods). (B) The significance of the similarity between DIAMOnD genes and seed genes suggests a cutoff of ∼200 DIAMOnD genes. (C) Network representation of the lysosomal storage diseases module. (D,E) Summary of the validation for all 70 disease modules based on GeneOntology (D) and biological pathways (E). (F) Fraction of seed proteins that are contained in the LCC of the DIAMOnD module for varying iteration steps. The distributions show the values obtained from 70 diseases. By introducing DIAMOnD proteins, previously disconnected seed proteins become part of the LCC.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4390154&req=5

pcbi.1004120.g004: Biological evaluation of DIAMOnD.(A) Validation of the DIAMOnD genes based on GeneOntology terms (see Materials & Methods). (B) The significance of the similarity between DIAMOnD genes and seed genes suggests a cutoff of ∼200 DIAMOnD genes. (C) Network representation of the lysosomal storage diseases module. (D,E) Summary of the validation for all 70 disease modules based on GeneOntology (D) and biological pathways (E). (F) Fraction of seed proteins that are contained in the LCC of the DIAMOnD module for varying iteration steps. The distributions show the values obtained from 70 diseases. By introducing DIAMOnD proteins, previously disconnected seed proteins become part of the LCC.

Mentions: Next we explore the performance of DIAMOnD on 70 real diseases. Since the full set of disease proteins is, by definition, unknown, we cannot assess the performance directly in terms of true positives/negatives. We therefore use publicly available gene annotation data, GeneOntology [27] and biological pathways from MSigDB [28] to validate the DIAMOnD disease modules: For each disease we determine a reference set of all significantly enriched GO-terms and pathways within the set of seed proteins. We then compare the respective annotations of each DIAMOnD gene to this reference set, assuming that proteins with annotations similar to the ones of the seed genes are more likely to be disease associated as well [1,29–32] (see Materials & Methods for details). Fig. 4A,B offers examples for the validation according to pathway similarity for lysosomal storage diseases. The first ∼200 DIAMOnD genes are found to participate in important seed pathways at a rate similar to the one within the seed proteins themselves and significantly higher than random expectation. In total, 58 out of 70 disease modules can be validated by either GO terms or pathways, 46 by both. Fig. 4D,E summarizes the validation of the disease modules for all 70 diseases. The majority of the detected modules perform several times better than random expectation, in particular in the first 50–100 iterations.


A DIseAse MOdule Detection (DIAMOnD) algorithm derived from a systematic analysis of connectivity patterns of disease proteins in the human interactome.

Ghiassian SD, Menche J, Barabási AL - PLoS Comput. Biol. (2015)

Biological evaluation of DIAMOnD.(A) Validation of the DIAMOnD genes based on GeneOntology terms (see Materials & Methods). (B) The significance of the similarity between DIAMOnD genes and seed genes suggests a cutoff of ∼200 DIAMOnD genes. (C) Network representation of the lysosomal storage diseases module. (D,E) Summary of the validation for all 70 disease modules based on GeneOntology (D) and biological pathways (E). (F) Fraction of seed proteins that are contained in the LCC of the DIAMOnD module for varying iteration steps. The distributions show the values obtained from 70 diseases. By introducing DIAMOnD proteins, previously disconnected seed proteins become part of the LCC.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4390154&req=5

pcbi.1004120.g004: Biological evaluation of DIAMOnD.(A) Validation of the DIAMOnD genes based on GeneOntology terms (see Materials & Methods). (B) The significance of the similarity between DIAMOnD genes and seed genes suggests a cutoff of ∼200 DIAMOnD genes. (C) Network representation of the lysosomal storage diseases module. (D,E) Summary of the validation for all 70 disease modules based on GeneOntology (D) and biological pathways (E). (F) Fraction of seed proteins that are contained in the LCC of the DIAMOnD module for varying iteration steps. The distributions show the values obtained from 70 diseases. By introducing DIAMOnD proteins, previously disconnected seed proteins become part of the LCC.
Mentions: Next we explore the performance of DIAMOnD on 70 real diseases. Since the full set of disease proteins is, by definition, unknown, we cannot assess the performance directly in terms of true positives/negatives. We therefore use publicly available gene annotation data, GeneOntology [27] and biological pathways from MSigDB [28] to validate the DIAMOnD disease modules: For each disease we determine a reference set of all significantly enriched GO-terms and pathways within the set of seed proteins. We then compare the respective annotations of each DIAMOnD gene to this reference set, assuming that proteins with annotations similar to the ones of the seed genes are more likely to be disease associated as well [1,29–32] (see Materials & Methods for details). Fig. 4A,B offers examples for the validation according to pathway similarity for lysosomal storage diseases. The first ∼200 DIAMOnD genes are found to participate in important seed pathways at a rate similar to the one within the seed proteins themselves and significantly higher than random expectation. In total, 58 out of 70 disease modules can be validated by either GO terms or pathways, 46 by both. Fig. 4D,E summarizes the validation of the disease modules for all 70 diseases. The majority of the detected modules perform several times better than random expectation, in particular in the first 50–100 iterations.

Bottom Line: The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease.While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored.We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity.

View Article: PubMed Central - PubMed

Affiliation: Center for Complex Networks Research and Department of Physics, Northeastern University, Boston, Massachusetts, United States of America; Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America.

ABSTRACT
The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease. Such approaches build on the assumption that protein interaction networks can be viewed as maps in which diseases can be identified with localized perturbation within a certain neighborhood. The identification of these neighborhoods, or disease modules, is therefore a prerequisite of a detailed investigation of a particular pathophenotype. While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored. In this work we aim to fill this gap by analyzing the network properties of a comprehensive corpus of 70 complex diseases. We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity. This quantity inspires the design of a novel Disease Module Detection (DIAMOnD) algorithm to identify the full disease module around a set of known disease proteins. We study the performance of the algorithm using well-controlled synthetic data and systematically validate the identified neighborhoods for a large corpus of diseases.

No MeSH data available.


Related in: MedlinePlus