Limits...
HOODS: finding context-specific neighborhoods of proteins, chemicals and diseases.

Palleja A, Jensen LJ - PeerJ (2015)

Bottom Line: Clustering algorithms are often used to find groups relevant in a specific context; however, they are not informed about this context.We present a simple algorithm, HOODS, which identifies context-specific neighborhoods of entities from a similarity matrix and a list of entities specifying the context.We illustrate its applicability by finding disease-specific neighborhoods of functionally associated proteins, kinase-specific neighborhoods of structurally similar inhibitors, and physiological-system-specific neighborhoods of interconnected diseases.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen , Copenhagen N , Denmark ; The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen , Copenhagen Ø , Denmark.

ABSTRACT
Clustering algorithms are often used to find groups relevant in a specific context; however, they are not informed about this context. We present a simple algorithm, HOODS, which identifies context-specific neighborhoods of entities from a similarity matrix and a list of entities specifying the context. We illustrate its applicability by finding disease-specific neighborhoods of functionally associated proteins, kinase-specific neighborhoods of structurally similar inhibitors, and physiological-system-specific neighborhoods of interconnected diseases. HOODS can be used via a simple interface at http://hoods.jensenlab.org, from where the source code can also be downloaded.

No MeSH data available.


Related in: MedlinePlus

Outline of the algorithm.The example illustrates how HOODs calculates the neighborhood scores for each entity and selects a non-redundant subset of neighborhoods (A) For two seeds namely E27 and E9, we show the nearest neighbors sorted by similarity scores. Blue nodes indicate the labeled entities. Looping through the sorted nearest neighbors, the neighborhood score (using the default alpha value of 0.8) varies as both the running sum of the label scores and the neighborhood size change. Notice that the neighborhood score is only computed when the neighborhood contains at least two entities with a label score bigger than 0. The boxes delineate a new possible neighborhood each time a new labeled entity is encountered (these neighborhoods have their score in bold). (B) The algorithm next ranks the neighborhoods by score. To discard redundant neighborhoods, the algorithm loops over the ranked neighborhoods and counts: (i) the number of labeled entities not yet seen in higher ranked neighborhoods (“New labeled entities”), (ii) the number of entities not yet seen in higher ranked neighborhoods (“New entities”), and (iii) the total number of entities from the similarity matrix that have been used to build the set of neighborhoods (“Total entities used”). For instance, one of the neighborhoods obtained using E27 as seed is discarded because it adds no new labeled entities (box and numbers shown in gray).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4493695&req=5

fig-1: Outline of the algorithm.The example illustrates how HOODs calculates the neighborhood scores for each entity and selects a non-redundant subset of neighborhoods (A) For two seeds namely E27 and E9, we show the nearest neighbors sorted by similarity scores. Blue nodes indicate the labeled entities. Looping through the sorted nearest neighbors, the neighborhood score (using the default alpha value of 0.8) varies as both the running sum of the label scores and the neighborhood size change. Notice that the neighborhood score is only computed when the neighborhood contains at least two entities with a label score bigger than 0. The boxes delineate a new possible neighborhood each time a new labeled entity is encountered (these neighborhoods have their score in bold). (B) The algorithm next ranks the neighborhoods by score. To discard redundant neighborhoods, the algorithm loops over the ranked neighborhoods and counts: (i) the number of labeled entities not yet seen in higher ranked neighborhoods (“New labeled entities”), (ii) the number of entities not yet seen in higher ranked neighborhoods (“New entities”), and (iii) the total number of entities from the similarity matrix that have been used to build the set of neighborhoods (“Total entities used”). For instance, one of the neighborhoods obtained using E27 as seed is discarded because it adds no new labeled entities (box and numbers shown in gray).

Mentions: HOODS identifies groups of related entities starting from two inputs: (1) a similarity matrix of the entities; and (2) a set of scores that associate entities with the context of interest (si). The algorithm considers every entity in the matrix as a possible seed for a neighborhood. For each seed it goes through the nearest neighbors sorted by similarity and calculates a neighborhood score (Sj) at every step (Fig. 1A). This score is defined as follows: \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy}\usepackage{upgreek}\usepackage{mathrsfs}\setlength{\oddsidemargin}{-69pt}\begin{document}}{}\begin{eqnarray*} {S}_{j}={\left(\displaystyle \sum _{i=1}^{j}{s}_{i}/j\right)}^{\alpha }\times {\left(\displaystyle \sum _{i=1}^{j}{s}_{i}\right)}^{1-\alpha }=\left(\displaystyle \sum _{i=1}^{j}{s}_{i}\right)/{j}^{\alpha }. \end{eqnarray*}\end{document}Sj=∑i=1jsi/jα×∑i=1jsi1−α=∑i=1jsi/jα. The parameter α is needed to set how much the user wants to penalize big neighborhoods and it ranges from 0 to 1. In this equation we take into account the label score density in our neighborhoods and the total label score of them . The score is a weighted geometric average from these two components; ideally a neighborhood should have both a high density of labels and a large number of labeled entities.


HOODS: finding context-specific neighborhoods of proteins, chemicals and diseases.

Palleja A, Jensen LJ - PeerJ (2015)

Outline of the algorithm.The example illustrates how HOODs calculates the neighborhood scores for each entity and selects a non-redundant subset of neighborhoods (A) For two seeds namely E27 and E9, we show the nearest neighbors sorted by similarity scores. Blue nodes indicate the labeled entities. Looping through the sorted nearest neighbors, the neighborhood score (using the default alpha value of 0.8) varies as both the running sum of the label scores and the neighborhood size change. Notice that the neighborhood score is only computed when the neighborhood contains at least two entities with a label score bigger than 0. The boxes delineate a new possible neighborhood each time a new labeled entity is encountered (these neighborhoods have their score in bold). (B) The algorithm next ranks the neighborhoods by score. To discard redundant neighborhoods, the algorithm loops over the ranked neighborhoods and counts: (i) the number of labeled entities not yet seen in higher ranked neighborhoods (“New labeled entities”), (ii) the number of entities not yet seen in higher ranked neighborhoods (“New entities”), and (iii) the total number of entities from the similarity matrix that have been used to build the set of neighborhoods (“Total entities used”). For instance, one of the neighborhoods obtained using E27 as seed is discarded because it adds no new labeled entities (box and numbers shown in gray).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4493695&req=5

fig-1: Outline of the algorithm.The example illustrates how HOODs calculates the neighborhood scores for each entity and selects a non-redundant subset of neighborhoods (A) For two seeds namely E27 and E9, we show the nearest neighbors sorted by similarity scores. Blue nodes indicate the labeled entities. Looping through the sorted nearest neighbors, the neighborhood score (using the default alpha value of 0.8) varies as both the running sum of the label scores and the neighborhood size change. Notice that the neighborhood score is only computed when the neighborhood contains at least two entities with a label score bigger than 0. The boxes delineate a new possible neighborhood each time a new labeled entity is encountered (these neighborhoods have their score in bold). (B) The algorithm next ranks the neighborhoods by score. To discard redundant neighborhoods, the algorithm loops over the ranked neighborhoods and counts: (i) the number of labeled entities not yet seen in higher ranked neighborhoods (“New labeled entities”), (ii) the number of entities not yet seen in higher ranked neighborhoods (“New entities”), and (iii) the total number of entities from the similarity matrix that have been used to build the set of neighborhoods (“Total entities used”). For instance, one of the neighborhoods obtained using E27 as seed is discarded because it adds no new labeled entities (box and numbers shown in gray).
Mentions: HOODS identifies groups of related entities starting from two inputs: (1) a similarity matrix of the entities; and (2) a set of scores that associate entities with the context of interest (si). The algorithm considers every entity in the matrix as a possible seed for a neighborhood. For each seed it goes through the nearest neighbors sorted by similarity and calculates a neighborhood score (Sj) at every step (Fig. 1A). This score is defined as follows: \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy}\usepackage{upgreek}\usepackage{mathrsfs}\setlength{\oddsidemargin}{-69pt}\begin{document}}{}\begin{eqnarray*} {S}_{j}={\left(\displaystyle \sum _{i=1}^{j}{s}_{i}/j\right)}^{\alpha }\times {\left(\displaystyle \sum _{i=1}^{j}{s}_{i}\right)}^{1-\alpha }=\left(\displaystyle \sum _{i=1}^{j}{s}_{i}\right)/{j}^{\alpha }. \end{eqnarray*}\end{document}Sj=∑i=1jsi/jα×∑i=1jsi1−α=∑i=1jsi/jα. The parameter α is needed to set how much the user wants to penalize big neighborhoods and it ranges from 0 to 1. In this equation we take into account the label score density in our neighborhoods and the total label score of them . The score is a weighted geometric average from these two components; ideally a neighborhood should have both a high density of labels and a large number of labeled entities.

Bottom Line: Clustering algorithms are often used to find groups relevant in a specific context; however, they are not informed about this context.We present a simple algorithm, HOODS, which identifies context-specific neighborhoods of entities from a similarity matrix and a list of entities specifying the context.We illustrate its applicability by finding disease-specific neighborhoods of functionally associated proteins, kinase-specific neighborhoods of structurally similar inhibitors, and physiological-system-specific neighborhoods of interconnected diseases.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen , Copenhagen N , Denmark ; The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen , Copenhagen Ø , Denmark.

ABSTRACT
Clustering algorithms are often used to find groups relevant in a specific context; however, they are not informed about this context. We present a simple algorithm, HOODS, which identifies context-specific neighborhoods of entities from a similarity matrix and a list of entities specifying the context. We illustrate its applicability by finding disease-specific neighborhoods of functionally associated proteins, kinase-specific neighborhoods of structurally similar inhibitors, and physiological-system-specific neighborhoods of interconnected diseases. HOODS can be used via a simple interface at http://hoods.jensenlab.org, from where the source code can also be downloaded.

No MeSH data available.


Related in: MedlinePlus