Limits...
Integrating multiple protein-protein interaction networks to prioritize disease genes: a Bayesian regression approach.

Zhang W, Sun F, Jiang R - BMC Bioinformatics (2011)

Bottom Line: We demonstrate the effectiveness of the Bayesian regression approach on five PPI networks via large scale leave-one-out cross-validation experiments and summarize the results in terms of the mean rank ratio of known disease genes and the area under the receiver operating characteristic curve (AUC).The Bayesian regression approach can achieve much higher performance than the existing CIPHER approach and the ordinary linear regression method.The integration of multiple PPI networks can greatly improve the scope of application of the proposed method in the inference of disease genes.

View Article: PubMed Central - HTML - PubMed

Affiliation: MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 10084, China.

ABSTRACT

Background: The identification of genes responsible for human inherited diseases is one of the most challenging tasks in human genetics. Recent studies based on phenotype similarity and gene proximity have demonstrated great success in prioritizing candidate genes for human diseases. However, most of these methods rely on a single protein-protein interaction (PPI) network to calculate similarities between genes, and thus greatly restrict the scope of application of such methods. Meanwhile, independently constructed and maintained PPI networks are usually quite diverse in coverage and quality, making the selection of a suitable PPI network inevitable but difficult.

Methods: We adopt a linear model to explain similarities between disease phenotypes using gene proximities that are quantified by diffusion kernels of one or more PPI networks. We solve this model via a Bayesian approach, and we derive an analytic form for Bayes factor that naturally measures the strength of association between a query disease and a candidate gene and thus can be used as a score to prioritize candidate genes. This method is intrinsically capable of integrating multiple PPI networks.

Results: We show that gene proximities calculated from PPI networks imply phenotype similarities. We demonstrate the effectiveness of the Bayesian regression approach on five PPI networks via large scale leave-one-out cross-validation experiments and summarize the results in terms of the mean rank ratio of known disease genes and the area under the receiver operating characteristic curve (AUC). We further show the capability of our approach in integrating multiple PPI networks.

Conclusions: The Bayesian regression approach can achieve much higher performance than the existing CIPHER approach and the ordinary linear regression method. The integration of multiple PPI networks can greatly improve the scope of application of the proposed method in the inference of disease genes.

Show MeSH

Related in: MedlinePlus

ROC curves of the five PPI networks on random controls (A) and linkage intervals (B). AUC scores for the validation of random controls are average of 10 independent runs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3044265&req=5

Figure 2: ROC curves of the five PPI networks on random controls (A) and linkage intervals (B). AUC scores for the validation of random controls are average of 10 independent runs.

Mentions: We design a series of large scale leave-one-out cross-validation experiments to show the validity and effectiveness of the Bayesian regression approach on individual PPI networks. As described in the method section, in each run of the validation procedure, we prioritize candidate genes according to the Bayes factors against two control sets, random controls and linkage intervals, with the performance being evaluated by mean rank ratios and AUC scores. Results are shown in Table 2 and Figure 2.


Integrating multiple protein-protein interaction networks to prioritize disease genes: a Bayesian regression approach.

Zhang W, Sun F, Jiang R - BMC Bioinformatics (2011)

ROC curves of the five PPI networks on random controls (A) and linkage intervals (B). AUC scores for the validation of random controls are average of 10 independent runs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3044265&req=5

Figure 2: ROC curves of the five PPI networks on random controls (A) and linkage intervals (B). AUC scores for the validation of random controls are average of 10 independent runs.
Mentions: We design a series of large scale leave-one-out cross-validation experiments to show the validity and effectiveness of the Bayesian regression approach on individual PPI networks. As described in the method section, in each run of the validation procedure, we prioritize candidate genes according to the Bayes factors against two control sets, random controls and linkage intervals, with the performance being evaluated by mean rank ratios and AUC scores. Results are shown in Table 2 and Figure 2.

Bottom Line: We demonstrate the effectiveness of the Bayesian regression approach on five PPI networks via large scale leave-one-out cross-validation experiments and summarize the results in terms of the mean rank ratio of known disease genes and the area under the receiver operating characteristic curve (AUC).The Bayesian regression approach can achieve much higher performance than the existing CIPHER approach and the ordinary linear regression method.The integration of multiple PPI networks can greatly improve the scope of application of the proposed method in the inference of disease genes.

View Article: PubMed Central - HTML - PubMed

Affiliation: MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 10084, China.

ABSTRACT

Background: The identification of genes responsible for human inherited diseases is one of the most challenging tasks in human genetics. Recent studies based on phenotype similarity and gene proximity have demonstrated great success in prioritizing candidate genes for human diseases. However, most of these methods rely on a single protein-protein interaction (PPI) network to calculate similarities between genes, and thus greatly restrict the scope of application of such methods. Meanwhile, independently constructed and maintained PPI networks are usually quite diverse in coverage and quality, making the selection of a suitable PPI network inevitable but difficult.

Methods: We adopt a linear model to explain similarities between disease phenotypes using gene proximities that are quantified by diffusion kernels of one or more PPI networks. We solve this model via a Bayesian approach, and we derive an analytic form for Bayes factor that naturally measures the strength of association between a query disease and a candidate gene and thus can be used as a score to prioritize candidate genes. This method is intrinsically capable of integrating multiple PPI networks.

Results: We show that gene proximities calculated from PPI networks imply phenotype similarities. We demonstrate the effectiveness of the Bayesian regression approach on five PPI networks via large scale leave-one-out cross-validation experiments and summarize the results in terms of the mean rank ratio of known disease genes and the area under the receiver operating characteristic curve (AUC). We further show the capability of our approach in integrating multiple PPI networks.

Conclusions: The Bayesian regression approach can achieve much higher performance than the existing CIPHER approach and the ordinary linear regression method. The integration of multiple PPI networks can greatly improve the scope of application of the proposed method in the inference of disease genes.

Show MeSH
Related in: MedlinePlus