Limits...
Relating diseases by integrating gene associations and information flow through protein interaction network.

Hamaneh MB, Yu YK - PLoS ONE (2014)

Bottom Line: We have also compared our results to those of MimMiner, a text-mining method that assigns pairwise similarity scores to diseases.We find the results of the two methods to be complementary.Although not needed for understanding this paper, the raw results are available for download for further study at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/DiseaseRelations/.

View Article: PubMed Central - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America.

ABSTRACT
Identifying similar diseases could potentially provide deeper understanding of their underlying causes, and may even hint at possible treatments. For this purpose, it is necessary to have a similarity measure that reflects the underpinning molecular interactions and biological pathways. We have thus devised a network-based measure that can partially fulfill this goal. Our method assigns weights to all proteins (and consequently their encoding genes) by using information flow from a disease to the protein interaction network and back. Similarity between two diseases is then defined as the cosine of the angle between their corresponding weight vectors. The proposed method also provides a way to suggest disease-pathway associations by using the weights assigned to the genes to perform enrichment analysis for each disease. By calculating pairwise similarities between 2534 diseases, we show that our disease similarity measure is strongly correlated with the probability of finding the diseases in the same disease family and, more importantly, sharing biological pathways. We have also compared our results to those of MimMiner, a text-mining method that assigns pairwise similarity scores to diseases. We find the results of the two methods to be complementary. It is also shown that clustering diseases based on their similarities and performing enrichment analysis for the cluster centers significantly increases the term association rate, suggesting that the cluster centers are better representatives for biological pathways than the diseases themselves. This lends support to the view that our similarity measure is a good indicator of relatedness of biological processes involved in causing the diseases. Although not needed for understanding this paper, the raw results are available for download for further study at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/DiseaseRelations/.

Show MeSH

Related in: MedlinePlus

Comparison with MimMiner.(A) The inset figure shows the number () of weighted disease pairs with shared KEGG pathways that were ranked higher than  by MimMiner (in red) and or by our method (in blue). Also shown in the inset (in green) is the weighted number of pairs with common term associations missed (ranked lower) by MimMiner, but identified (ranked higher) by our model. In the main panel, the same quantities corresponding to the proposed method are plotted after exclusion of obvious candidates for being related. The closeness between the blue and green curves indicates that the non-apparent candidates found by our method are largely missed by MimMiner. Displayed in panel (B) is the inverse of average normalized rank versus the term similarity cutoff. At large similarity cutoff, the higher the average normalized rank (the smaller  and thus the larger ) the better the agreement between the quality scores (cosine similarity or the MimMiner score) and the KEGG annotation.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4216010&req=5

pone-0110936-g003: Comparison with MimMiner.(A) The inset figure shows the number () of weighted disease pairs with shared KEGG pathways that were ranked higher than by MimMiner (in red) and or by our method (in blue). Also shown in the inset (in green) is the weighted number of pairs with common term associations missed (ranked lower) by MimMiner, but identified (ranked higher) by our model. In the main panel, the same quantities corresponding to the proposed method are plotted after exclusion of obvious candidates for being related. The closeness between the blue and green curves indicates that the non-apparent candidates found by our method are largely missed by MimMiner. Displayed in panel (B) is the inverse of average normalized rank versus the term similarity cutoff. At large similarity cutoff, the higher the average normalized rank (the smaller and thus the larger ) the better the agreement between the quality scores (cosine similarity or the MimMiner score) and the KEGG annotation.

Mentions: The results (MimMiner in red, our method in blue) are shown in the inset of Fig. 3 (A). The green curve shows the weighted number of disease pairs identified (ranked higher than ) by our method, but missed (ranked lower) by MimMiner. Similar trends are observed for both methods, but a better performance (faster rise in ) for MimMiner is indicated. This finding is expected, because MimMiner is based on mining the literature, which is also the source of the manually curated data in the KEGG DISEASE database. However, an important observation is that the two methods do not find the same pairs, especially in terms of less apparent relationships. To see this feature, we first excluded the disease pairs that were obvious candidates for being related, i.e. sibling diseases and pairs with common gene associations (3847 pairs were excluded leaving 332763). We then recomputed the blue and the green curves, shown in Fig. 3 (A). The closeness between these two curves indicates that for non-apparent relationships, the disease pairs identified by our method are largely missed by MimMiner. In Fig. 3 (A), about 87% of pairs ranked higher than (equivalent to a correlation of and a MimMiner score of 0.41) by the method presented here were missed by MimMiner.


Relating diseases by integrating gene associations and information flow through protein interaction network.

Hamaneh MB, Yu YK - PLoS ONE (2014)

Comparison with MimMiner.(A) The inset figure shows the number () of weighted disease pairs with shared KEGG pathways that were ranked higher than  by MimMiner (in red) and or by our method (in blue). Also shown in the inset (in green) is the weighted number of pairs with common term associations missed (ranked lower) by MimMiner, but identified (ranked higher) by our model. In the main panel, the same quantities corresponding to the proposed method are plotted after exclusion of obvious candidates for being related. The closeness between the blue and green curves indicates that the non-apparent candidates found by our method are largely missed by MimMiner. Displayed in panel (B) is the inverse of average normalized rank versus the term similarity cutoff. At large similarity cutoff, the higher the average normalized rank (the smaller  and thus the larger ) the better the agreement between the quality scores (cosine similarity or the MimMiner score) and the KEGG annotation.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4216010&req=5

pone-0110936-g003: Comparison with MimMiner.(A) The inset figure shows the number () of weighted disease pairs with shared KEGG pathways that were ranked higher than by MimMiner (in red) and or by our method (in blue). Also shown in the inset (in green) is the weighted number of pairs with common term associations missed (ranked lower) by MimMiner, but identified (ranked higher) by our model. In the main panel, the same quantities corresponding to the proposed method are plotted after exclusion of obvious candidates for being related. The closeness between the blue and green curves indicates that the non-apparent candidates found by our method are largely missed by MimMiner. Displayed in panel (B) is the inverse of average normalized rank versus the term similarity cutoff. At large similarity cutoff, the higher the average normalized rank (the smaller and thus the larger ) the better the agreement between the quality scores (cosine similarity or the MimMiner score) and the KEGG annotation.
Mentions: The results (MimMiner in red, our method in blue) are shown in the inset of Fig. 3 (A). The green curve shows the weighted number of disease pairs identified (ranked higher than ) by our method, but missed (ranked lower) by MimMiner. Similar trends are observed for both methods, but a better performance (faster rise in ) for MimMiner is indicated. This finding is expected, because MimMiner is based on mining the literature, which is also the source of the manually curated data in the KEGG DISEASE database. However, an important observation is that the two methods do not find the same pairs, especially in terms of less apparent relationships. To see this feature, we first excluded the disease pairs that were obvious candidates for being related, i.e. sibling diseases and pairs with common gene associations (3847 pairs were excluded leaving 332763). We then recomputed the blue and the green curves, shown in Fig. 3 (A). The closeness between these two curves indicates that for non-apparent relationships, the disease pairs identified by our method are largely missed by MimMiner. In Fig. 3 (A), about 87% of pairs ranked higher than (equivalent to a correlation of and a MimMiner score of 0.41) by the method presented here were missed by MimMiner.

Bottom Line: We have also compared our results to those of MimMiner, a text-mining method that assigns pairwise similarity scores to diseases.We find the results of the two methods to be complementary.Although not needed for understanding this paper, the raw results are available for download for further study at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/DiseaseRelations/.

View Article: PubMed Central - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America.

ABSTRACT
Identifying similar diseases could potentially provide deeper understanding of their underlying causes, and may even hint at possible treatments. For this purpose, it is necessary to have a similarity measure that reflects the underpinning molecular interactions and biological pathways. We have thus devised a network-based measure that can partially fulfill this goal. Our method assigns weights to all proteins (and consequently their encoding genes) by using information flow from a disease to the protein interaction network and back. Similarity between two diseases is then defined as the cosine of the angle between their corresponding weight vectors. The proposed method also provides a way to suggest disease-pathway associations by using the weights assigned to the genes to perform enrichment analysis for each disease. By calculating pairwise similarities between 2534 diseases, we show that our disease similarity measure is strongly correlated with the probability of finding the diseases in the same disease family and, more importantly, sharing biological pathways. We have also compared our results to those of MimMiner, a text-mining method that assigns pairwise similarity scores to diseases. We find the results of the two methods to be complementary. It is also shown that clustering diseases based on their similarities and performing enrichment analysis for the cluster centers significantly increases the term association rate, suggesting that the cluster centers are better representatives for biological pathways than the diseases themselves. This lends support to the view that our similarity measure is a good indicator of relatedness of biological processes involved in causing the diseases. Although not needed for understanding this paper, the raw results are available for download for further study at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/DiseaseRelations/.

Show MeSH
Related in: MedlinePlus