Limits...
Relating diseases by integrating gene associations and information flow through protein interaction network.

Hamaneh MB, Yu YK - PLoS ONE (2014)

Bottom Line: We have also compared our results to those of MimMiner, a text-mining method that assigns pairwise similarity scores to diseases.We find the results of the two methods to be complementary.Although not needed for understanding this paper, the raw results are available for download for further study at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/DiseaseRelations/.

View Article: PubMed Central - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America.

ABSTRACT
Identifying similar diseases could potentially provide deeper understanding of their underlying causes, and may even hint at possible treatments. For this purpose, it is necessary to have a similarity measure that reflects the underpinning molecular interactions and biological pathways. We have thus devised a network-based measure that can partially fulfill this goal. Our method assigns weights to all proteins (and consequently their encoding genes) by using information flow from a disease to the protein interaction network and back. Similarity between two diseases is then defined as the cosine of the angle between their corresponding weight vectors. The proposed method also provides a way to suggest disease-pathway associations by using the weights assigned to the genes to perform enrichment analysis for each disease. By calculating pairwise similarities between 2534 diseases, we show that our disease similarity measure is strongly correlated with the probability of finding the diseases in the same disease family and, more importantly, sharing biological pathways. We have also compared our results to those of MimMiner, a text-mining method that assigns pairwise similarity scores to diseases. We find the results of the two methods to be complementary. It is also shown that clustering diseases based on their similarities and performing enrichment analysis for the cluster centers significantly increases the term association rate, suggesting that the cluster centers are better representatives for biological pathways than the diseases themselves. This lends support to the view that our similarity measure is a good indicator of relatedness of biological processes involved in causing the diseases. Although not needed for understanding this paper, the raw results are available for download for further study at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/DiseaseRelations/.

Show MeSH

Related in: MedlinePlus

The relation between the results of enrichment analysis and the average correlation .The percentage of diseases for which GO/KEGG terms were identified by Saddlesum as a function of average correlation . To facilitate the calculation, we sorted all s in ascending order and placed them into bins each containing  diseases. The percentage is then measured by the number  of diseases with GO/KEGG term hit(s) per bin. For very low average correlations  is significantly lower.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4216010&req=5

pone-0110936-g001: The relation between the results of enrichment analysis and the average correlation .The percentage of diseases for which GO/KEGG terms were identified by Saddlesum as a function of average correlation . To facilitate the calculation, we sorted all s in ascending order and placed them into bins each containing diseases. The percentage is then measured by the number of diseases with GO/KEGG term hit(s) per bin. For very low average correlations is significantly lower.

Mentions: Interestingly, there was a significant difference between the percentage of pairs with undefined when disease pairs with low and high correlations are considered. For example, was undefined for 23% of disease pairs with correlations greater than , as opposed to 64% for pairs with correlations smaller than . This can be understood through the fact that the percentage () of diseases that had been successfully assigned one or more GO/KEGG terms by Saddlesum was smaller for diseases with very low average correlations. This behavior is shown in Fig. 1. After sorting into ascending order and placing them in bins each containing diseases, we computed the average in a bin and, in the same bin, the number of diseases that had one or more GO/KEGG term hits. In Fig. 1, is plotted versus the average per bin and the aforementioned behavior is clearly displayed.


Relating diseases by integrating gene associations and information flow through protein interaction network.

Hamaneh MB, Yu YK - PLoS ONE (2014)

The relation between the results of enrichment analysis and the average correlation .The percentage of diseases for which GO/KEGG terms were identified by Saddlesum as a function of average correlation . To facilitate the calculation, we sorted all s in ascending order and placed them into bins each containing  diseases. The percentage is then measured by the number  of diseases with GO/KEGG term hit(s) per bin. For very low average correlations  is significantly lower.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4216010&req=5

pone-0110936-g001: The relation between the results of enrichment analysis and the average correlation .The percentage of diseases for which GO/KEGG terms were identified by Saddlesum as a function of average correlation . To facilitate the calculation, we sorted all s in ascending order and placed them into bins each containing diseases. The percentage is then measured by the number of diseases with GO/KEGG term hit(s) per bin. For very low average correlations is significantly lower.
Mentions: Interestingly, there was a significant difference between the percentage of pairs with undefined when disease pairs with low and high correlations are considered. For example, was undefined for 23% of disease pairs with correlations greater than , as opposed to 64% for pairs with correlations smaller than . This can be understood through the fact that the percentage () of diseases that had been successfully assigned one or more GO/KEGG terms by Saddlesum was smaller for diseases with very low average correlations. This behavior is shown in Fig. 1. After sorting into ascending order and placing them in bins each containing diseases, we computed the average in a bin and, in the same bin, the number of diseases that had one or more GO/KEGG term hits. In Fig. 1, is plotted versus the average per bin and the aforementioned behavior is clearly displayed.

Bottom Line: We have also compared our results to those of MimMiner, a text-mining method that assigns pairwise similarity scores to diseases.We find the results of the two methods to be complementary.Although not needed for understanding this paper, the raw results are available for download for further study at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/DiseaseRelations/.

View Article: PubMed Central - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America.

ABSTRACT
Identifying similar diseases could potentially provide deeper understanding of their underlying causes, and may even hint at possible treatments. For this purpose, it is necessary to have a similarity measure that reflects the underpinning molecular interactions and biological pathways. We have thus devised a network-based measure that can partially fulfill this goal. Our method assigns weights to all proteins (and consequently their encoding genes) by using information flow from a disease to the protein interaction network and back. Similarity between two diseases is then defined as the cosine of the angle between their corresponding weight vectors. The proposed method also provides a way to suggest disease-pathway associations by using the weights assigned to the genes to perform enrichment analysis for each disease. By calculating pairwise similarities between 2534 diseases, we show that our disease similarity measure is strongly correlated with the probability of finding the diseases in the same disease family and, more importantly, sharing biological pathways. We have also compared our results to those of MimMiner, a text-mining method that assigns pairwise similarity scores to diseases. We find the results of the two methods to be complementary. It is also shown that clustering diseases based on their similarities and performing enrichment analysis for the cluster centers significantly increases the term association rate, suggesting that the cluster centers are better representatives for biological pathways than the diseases themselves. This lends support to the view that our similarity measure is a good indicator of relatedness of biological processes involved in causing the diseases. Although not needed for understanding this paper, the raw results are available for download for further study at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/DiseaseRelations/.

Show MeSH
Related in: MedlinePlus