Limits...
HOODS: finding context-specific neighborhoods of proteins, chemicals and diseases.

Palleja A, Jensen LJ - PeerJ (2015)

Bottom Line: Clustering algorithms are often used to find groups relevant in a specific context; however, they are not informed about this context.We present a simple algorithm, HOODS, which identifies context-specific neighborhoods of entities from a similarity matrix and a list of entities specifying the context.We illustrate its applicability by finding disease-specific neighborhoods of functionally associated proteins, kinase-specific neighborhoods of structurally similar inhibitors, and physiological-system-specific neighborhoods of interconnected diseases.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen , Copenhagen N , Denmark ; The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen , Copenhagen Ø , Denmark.

ABSTRACT
Clustering algorithms are often used to find groups relevant in a specific context; however, they are not informed about this context. We present a simple algorithm, HOODS, which identifies context-specific neighborhoods of entities from a similarity matrix and a list of entities specifying the context. We illustrate its applicability by finding disease-specific neighborhoods of functionally associated proteins, kinase-specific neighborhoods of structurally similar inhibitors, and physiological-system-specific neighborhoods of interconnected diseases. HOODS can be used via a simple interface at http://hoods.jensenlab.org, from where the source code can also be downloaded.

No MeSH data available.


Related in: MedlinePlus

Validation of HOODS and estimation of alpha parameter.The bar chart shows the number of disease proteins correctly recovered before using 25, 50 or 100 proteins from the similarity matrix in the leave-one-out cross-validation of the method. The error bars represent the 95% confidence interval according to the Binomial distribution when using 100 proteins from the similarity matrix. For alpha values between 0.6 and 1, we observe similar performance, with 0.8 being the optimum.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4493695&req=5

fig-2: Validation of HOODS and estimation of alpha parameter.The bar chart shows the number of disease proteins correctly recovered before using 25, 50 or 100 proteins from the similarity matrix in the leave-one-out cross-validation of the method. The error bars represent the 95% confidence interval according to the Binomial distribution when using 100 proteins from the similarity matrix. For alpha values between 0.6 and 1, we observe similar performance, with 0.8 being the optimum.

Mentions: We first used HOODS to identify disease-related protein neighborhoods using the human protein network from STRING (Szklarczyk et al., 2011) as matrix and text-mined disease–protein associations from DISEASES (Pletscher-Frankild et al., 2015) as labels. To validate our method and to estimate a value for α we performed leave-one-out cross-validation on a set of the 100 proteins encoded by single-gene loci associated to 32 polygenic diseases in OMIM (Amberger, Bocchini & Hamosh, 2011). Going through the ranked neighborhoods, we counted the total number of unique proteins encountered before finding the left out protein, including all the proteins in the neighborhood containing it (Fig. 1B). HOODS showed similar, good performance for α ranging from 0.6 to 1.0 (Fig. 2). We chose 0.8 as the default value for α because it is both the middle of this range and the value that gave the best performance, recovering 80 of the 100 proteins from the OMIM benchmark set among the first 100 proteins used to build the networks (Fig. 2). To show that the good performance is not purely due to disease proteins being more studied, we redid the leave-one-out cross-validation choosing a random of the other 31 diseases as labels. This recovered only 1 protein of the 100 proteins.


HOODS: finding context-specific neighborhoods of proteins, chemicals and diseases.

Palleja A, Jensen LJ - PeerJ (2015)

Validation of HOODS and estimation of alpha parameter.The bar chart shows the number of disease proteins correctly recovered before using 25, 50 or 100 proteins from the similarity matrix in the leave-one-out cross-validation of the method. The error bars represent the 95% confidence interval according to the Binomial distribution when using 100 proteins from the similarity matrix. For alpha values between 0.6 and 1, we observe similar performance, with 0.8 being the optimum.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4493695&req=5

fig-2: Validation of HOODS and estimation of alpha parameter.The bar chart shows the number of disease proteins correctly recovered before using 25, 50 or 100 proteins from the similarity matrix in the leave-one-out cross-validation of the method. The error bars represent the 95% confidence interval according to the Binomial distribution when using 100 proteins from the similarity matrix. For alpha values between 0.6 and 1, we observe similar performance, with 0.8 being the optimum.
Mentions: We first used HOODS to identify disease-related protein neighborhoods using the human protein network from STRING (Szklarczyk et al., 2011) as matrix and text-mined disease–protein associations from DISEASES (Pletscher-Frankild et al., 2015) as labels. To validate our method and to estimate a value for α we performed leave-one-out cross-validation on a set of the 100 proteins encoded by single-gene loci associated to 32 polygenic diseases in OMIM (Amberger, Bocchini & Hamosh, 2011). Going through the ranked neighborhoods, we counted the total number of unique proteins encountered before finding the left out protein, including all the proteins in the neighborhood containing it (Fig. 1B). HOODS showed similar, good performance for α ranging from 0.6 to 1.0 (Fig. 2). We chose 0.8 as the default value for α because it is both the middle of this range and the value that gave the best performance, recovering 80 of the 100 proteins from the OMIM benchmark set among the first 100 proteins used to build the networks (Fig. 2). To show that the good performance is not purely due to disease proteins being more studied, we redid the leave-one-out cross-validation choosing a random of the other 31 diseases as labels. This recovered only 1 protein of the 100 proteins.

Bottom Line: Clustering algorithms are often used to find groups relevant in a specific context; however, they are not informed about this context.We present a simple algorithm, HOODS, which identifies context-specific neighborhoods of entities from a similarity matrix and a list of entities specifying the context.We illustrate its applicability by finding disease-specific neighborhoods of functionally associated proteins, kinase-specific neighborhoods of structurally similar inhibitors, and physiological-system-specific neighborhoods of interconnected diseases.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen , Copenhagen N , Denmark ; The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen , Copenhagen Ø , Denmark.

ABSTRACT
Clustering algorithms are often used to find groups relevant in a specific context; however, they are not informed about this context. We present a simple algorithm, HOODS, which identifies context-specific neighborhoods of entities from a similarity matrix and a list of entities specifying the context. We illustrate its applicability by finding disease-specific neighborhoods of functionally associated proteins, kinase-specific neighborhoods of structurally similar inhibitors, and physiological-system-specific neighborhoods of interconnected diseases. HOODS can be used via a simple interface at http://hoods.jensenlab.org, from where the source code can also be downloaded.

No MeSH data available.


Related in: MedlinePlus