Limits...
Relating diseases by integrating gene associations and information flow through protein interaction network.

Hamaneh MB, Yu YK - PLoS ONE (2014)

Bottom Line: We have also compared our results to those of MimMiner, a text-mining method that assigns pairwise similarity scores to diseases.We find the results of the two methods to be complementary.Although not needed for understanding this paper, the raw results are available for download for further study at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/DiseaseRelations/.

View Article: PubMed Central - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America.

ABSTRACT
Identifying similar diseases could potentially provide deeper understanding of their underlying causes, and may even hint at possible treatments. For this purpose, it is necessary to have a similarity measure that reflects the underpinning molecular interactions and biological pathways. We have thus devised a network-based measure that can partially fulfill this goal. Our method assigns weights to all proteins (and consequently their encoding genes) by using information flow from a disease to the protein interaction network and back. Similarity between two diseases is then defined as the cosine of the angle between their corresponding weight vectors. The proposed method also provides a way to suggest disease-pathway associations by using the weights assigned to the genes to perform enrichment analysis for each disease. By calculating pairwise similarities between 2534 diseases, we show that our disease similarity measure is strongly correlated with the probability of finding the diseases in the same disease family and, more importantly, sharing biological pathways. We have also compared our results to those of MimMiner, a text-mining method that assigns pairwise similarity scores to diseases. We find the results of the two methods to be complementary. It is also shown that clustering diseases based on their similarities and performing enrichment analysis for the cluster centers significantly increases the term association rate, suggesting that the cluster centers are better representatives for biological pathways than the diseases themselves. This lends support to the view that our similarity measure is a good indicator of relatedness of biological processes involved in causing the diseases. Although not needed for understanding this paper, the raw results are available for download for further study at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/DiseaseRelations/.

Show MeSH

Related in: MedlinePlus

The probabilities of having common term associations or being siblings.(A) The probabilities of finding a pair of diseases with (1) common GO/KEGG terms (red), (2) the same parents and common associations (blue), and (3) the same parents without shared biological terms (green) are shown. Here only pairs with a defined term similarity are considered. (B) For pairs with undefined  (pairs with at least one member not associated with any biological terms), the distribution of siblings is plotted as a function of correlation. (C) and (D) show similar quantities to (A) and (B) respectively, when the biological term associations are directly retrieved from the KEGG DISEASE database.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4216010&req=5

pone-0110936-g002: The probabilities of having common term associations or being siblings.(A) The probabilities of finding a pair of diseases with (1) common GO/KEGG terms (red), (2) the same parents and common associations (blue), and (3) the same parents without shared biological terms (green) are shown. Here only pairs with a defined term similarity are considered. (B) For pairs with undefined (pairs with at least one member not associated with any biological terms), the distribution of siblings is plotted as a function of correlation. (C) and (D) show similar quantities to (A) and (B) respectively, when the biological term associations are directly retrieved from the KEGG DISEASE database.

Mentions: Since was undefined for a large number of disease pairs, the pairs were divided into two sets: with defined (first set) and undefined (second set) term similarities. For the first set (with defined ), Fig. 2 (A) illustrates the behavior of (in green), (in blue), and (in red), where () is for pairs with (). The figure clearly shows, when , a rise in the probability of a disease pair to have common biological associations as correlation increases. The figure also indicates, when , that disease pairs with higher correlations are more likely to be siblings if they have . However, the siblings without shared terms have almost a flat (correlation-independent) distribution, although the percentage of such pairs is very small (about 0.5%). One possible explanation for these results is that the increase in the percentage of siblings in highly-correlated diseases is in fact due to an increase in the percentage of the pairs with . In other words, in high correlation regime, most of the siblings are a subset of disease pairs with shared GO/KEGG terms. Figure 2 (B) shows how varies with correlation for the second set of disease pairs (the ones with undefined term similarities).


Relating diseases by integrating gene associations and information flow through protein interaction network.

Hamaneh MB, Yu YK - PLoS ONE (2014)

The probabilities of having common term associations or being siblings.(A) The probabilities of finding a pair of diseases with (1) common GO/KEGG terms (red), (2) the same parents and common associations (blue), and (3) the same parents without shared biological terms (green) are shown. Here only pairs with a defined term similarity are considered. (B) For pairs with undefined  (pairs with at least one member not associated with any biological terms), the distribution of siblings is plotted as a function of correlation. (C) and (D) show similar quantities to (A) and (B) respectively, when the biological term associations are directly retrieved from the KEGG DISEASE database.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4216010&req=5

pone-0110936-g002: The probabilities of having common term associations or being siblings.(A) The probabilities of finding a pair of diseases with (1) common GO/KEGG terms (red), (2) the same parents and common associations (blue), and (3) the same parents without shared biological terms (green) are shown. Here only pairs with a defined term similarity are considered. (B) For pairs with undefined (pairs with at least one member not associated with any biological terms), the distribution of siblings is plotted as a function of correlation. (C) and (D) show similar quantities to (A) and (B) respectively, when the biological term associations are directly retrieved from the KEGG DISEASE database.
Mentions: Since was undefined for a large number of disease pairs, the pairs were divided into two sets: with defined (first set) and undefined (second set) term similarities. For the first set (with defined ), Fig. 2 (A) illustrates the behavior of (in green), (in blue), and (in red), where () is for pairs with (). The figure clearly shows, when , a rise in the probability of a disease pair to have common biological associations as correlation increases. The figure also indicates, when , that disease pairs with higher correlations are more likely to be siblings if they have . However, the siblings without shared terms have almost a flat (correlation-independent) distribution, although the percentage of such pairs is very small (about 0.5%). One possible explanation for these results is that the increase in the percentage of siblings in highly-correlated diseases is in fact due to an increase in the percentage of the pairs with . In other words, in high correlation regime, most of the siblings are a subset of disease pairs with shared GO/KEGG terms. Figure 2 (B) shows how varies with correlation for the second set of disease pairs (the ones with undefined term similarities).

Bottom Line: We have also compared our results to those of MimMiner, a text-mining method that assigns pairwise similarity scores to diseases.We find the results of the two methods to be complementary.Although not needed for understanding this paper, the raw results are available for download for further study at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/DiseaseRelations/.

View Article: PubMed Central - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America.

ABSTRACT
Identifying similar diseases could potentially provide deeper understanding of their underlying causes, and may even hint at possible treatments. For this purpose, it is necessary to have a similarity measure that reflects the underpinning molecular interactions and biological pathways. We have thus devised a network-based measure that can partially fulfill this goal. Our method assigns weights to all proteins (and consequently their encoding genes) by using information flow from a disease to the protein interaction network and back. Similarity between two diseases is then defined as the cosine of the angle between their corresponding weight vectors. The proposed method also provides a way to suggest disease-pathway associations by using the weights assigned to the genes to perform enrichment analysis for each disease. By calculating pairwise similarities between 2534 diseases, we show that our disease similarity measure is strongly correlated with the probability of finding the diseases in the same disease family and, more importantly, sharing biological pathways. We have also compared our results to those of MimMiner, a text-mining method that assigns pairwise similarity scores to diseases. We find the results of the two methods to be complementary. It is also shown that clustering diseases based on their similarities and performing enrichment analysis for the cluster centers significantly increases the term association rate, suggesting that the cluster centers are better representatives for biological pathways than the diseases themselves. This lends support to the view that our similarity measure is a good indicator of relatedness of biological processes involved in causing the diseases. Although not needed for understanding this paper, the raw results are available for download for further study at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/DiseaseRelations/.

Show MeSH
Related in: MedlinePlus