Limits...
DomainRBF: a Bayesian regression approach to the prioritization of candidate domains for complex diseases.

Zhang W, Chen Y, Sun F, Jiang R - BMC Syst Biol (2011)

Bottom Line: Within a given domain-domain interaction network, we make the assumption that similarities of disease phenotypes can be explained using proximities of domains associated with such diseases.The proposed approach effectively ranks susceptible domains among the top of the candidates, and it is robust to the parameters involved.The predicted landscape provides a comprehensive understanding of associations between domains and human diseases.

View Article: PubMed Central - HTML - PubMed

Affiliation: MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing, China.

ABSTRACT

Background: Domains are basic units of proteins, and thus exploring associations between protein domains and human inherited diseases will greatly improve our understanding of the pathogenesis of human complex diseases and further benefit the medical prevention, diagnosis and treatment of these diseases. Within a given domain-domain interaction network, we make the assumption that similarities of disease phenotypes can be explained using proximities of domains associated with such diseases. Based on this assumption, we propose a Bayesian regression approach named "domainRBF" (domain Rank with Bayes Factor) to prioritize candidate domains for human complex diseases.

Results: Using a compiled dataset containing 1,614 associations between 671 domains and 1,145 disease phenotypes, we demonstrate the effectiveness of the proposed approach through three large-scale leave-one-out cross-validation experiments (random control, simulated linkage interval, and genome-wide scan), and we do so in terms of three criteria (precision, mean rank ratio, and AUC score). We further show that the proposed approach is robust to the parameters involved and the underlying domain-domain interaction network through a series of permutation tests. Once having assessed the validity of this approach, we show the possibility of ab initio inference of domain-disease associations and gene-disease associations, and we illustrate the strong agreement between our inferences and the evidences from genome-wide association studies for four common diseases (type 1 diabetes, type 2 diabetes, Crohn's disease, and breast cancer). Finally, we provide a pre-calculated genome-wide landscape of associations between 5,490 protein domains and 5,080 human diseases and offer free access to this resource.

Conclusions: The proposed approach effectively ranks susceptible domains among the top of the candidates, and it is robust to the parameters involved. The ab initio inference of domain-disease associations shows strong agreement with the evidence provided by genome-wide association studies. The predicted landscape provides a comprehensive understanding of associations between domains and human diseases.

Show MeSH

Related in: MedlinePlus

Effects of parameters on distance measures. (A) Influence of β (from 0.1 to 10 with step 0.1) on shortest path with Gaussian kernel. (B) Influence of γ (from 0.01 to 1 with step 0.01) on diffusion kernel. The small domain-domain interaction network composed of the PDB part of the DOMINE database is used to obtain the results.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3108930&req=5

Figure 6: Effects of parameters on distance measures. (A) Influence of β (from 0.1 to 10 with step 0.1) on shortest path with Gaussian kernel. (B) Influence of γ (from 0.01 to 1 with step 0.01) on diffusion kernel. The small domain-domain interaction network composed of the PDB part of the DOMINE database is used to obtain the results.

Mentions: We further notice that the parameter β in the shortest path measure with Gaussian kernel and the parameter γ in the diffusion kernel are free parameters that need to be pre-determined (see Materials and Methods for details). In the above cross-validation experiments we set these parameters as 1 and 0.05, respectively, for simplicity. However, it is necessary to show whether the prioritization methods are sensitive to these parameters. For this purpose, we select several values across the range of these parameters, perform the cross-validation experiments, and see how the results change accordingly. We take the prioritization results using the domainRBF approach against random controls (in Table 1) as an example to illustrate the influence of β. Since this parameter ranges from 0 to +∞, we perform a grid search of this parameter by changing it from 0.1 to 10 with step 0.1 and see the effect, as reflected in the change of precision, mean rank ratio, and AUC score as shown in Figure 6(A). From the curve we can see that when β changes from 0.1 to 1, there is an obvious upward climb for the three criteria, while after the point β = 1.0 (precision = 26.56%, mean rank ratio = 11.99%, and AUC score = 88.80%), the values in the curve becomes fairly stable. Even so, we find that the peak performance is obtained at β = 3.7 (precision = 28.89%, mean rank ratio = 10.67%, and AUC score = 90.20%), and the worst performance is obtained at β = 0.1 (precision = 18.01%, mean rank ratio = 19.23%, and AUC score = 81.63%). From these results, we conclude that the prioritization methods are not sensitive to this free parameter when β is greater than 1. Similarly, we find that the prioritization methods are not sensitive to the free parameter γ when it is smaller than 0.15 (data not shown). The corresponding changes in precision, mean rank ratio, and AUC score are shown in Figure 6(B). We find that the peak performance is obtained at γ = 0.03 (precision = 29.55%, mean rank ratio = 10.61%, and AUC score = 90.16%), and the worst performance is obtained at γ = 0.93 (precision = 24.86%, mean rank ratio = 13.34%, and AUC score = 87.46%). From the results, we can see that the proposed approach is quite robust when β in the shortest path with Gaussian kernel is greater than 1 or when γ in the diffusion kernel is smaller than 0.15.


DomainRBF: a Bayesian regression approach to the prioritization of candidate domains for complex diseases.

Zhang W, Chen Y, Sun F, Jiang R - BMC Syst Biol (2011)

Effects of parameters on distance measures. (A) Influence of β (from 0.1 to 10 with step 0.1) on shortest path with Gaussian kernel. (B) Influence of γ (from 0.01 to 1 with step 0.01) on diffusion kernel. The small domain-domain interaction network composed of the PDB part of the DOMINE database is used to obtain the results.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3108930&req=5

Figure 6: Effects of parameters on distance measures. (A) Influence of β (from 0.1 to 10 with step 0.1) on shortest path with Gaussian kernel. (B) Influence of γ (from 0.01 to 1 with step 0.01) on diffusion kernel. The small domain-domain interaction network composed of the PDB part of the DOMINE database is used to obtain the results.
Mentions: We further notice that the parameter β in the shortest path measure with Gaussian kernel and the parameter γ in the diffusion kernel are free parameters that need to be pre-determined (see Materials and Methods for details). In the above cross-validation experiments we set these parameters as 1 and 0.05, respectively, for simplicity. However, it is necessary to show whether the prioritization methods are sensitive to these parameters. For this purpose, we select several values across the range of these parameters, perform the cross-validation experiments, and see how the results change accordingly. We take the prioritization results using the domainRBF approach against random controls (in Table 1) as an example to illustrate the influence of β. Since this parameter ranges from 0 to +∞, we perform a grid search of this parameter by changing it from 0.1 to 10 with step 0.1 and see the effect, as reflected in the change of precision, mean rank ratio, and AUC score as shown in Figure 6(A). From the curve we can see that when β changes from 0.1 to 1, there is an obvious upward climb for the three criteria, while after the point β = 1.0 (precision = 26.56%, mean rank ratio = 11.99%, and AUC score = 88.80%), the values in the curve becomes fairly stable. Even so, we find that the peak performance is obtained at β = 3.7 (precision = 28.89%, mean rank ratio = 10.67%, and AUC score = 90.20%), and the worst performance is obtained at β = 0.1 (precision = 18.01%, mean rank ratio = 19.23%, and AUC score = 81.63%). From these results, we conclude that the prioritization methods are not sensitive to this free parameter when β is greater than 1. Similarly, we find that the prioritization methods are not sensitive to the free parameter γ when it is smaller than 0.15 (data not shown). The corresponding changes in precision, mean rank ratio, and AUC score are shown in Figure 6(B). We find that the peak performance is obtained at γ = 0.03 (precision = 29.55%, mean rank ratio = 10.61%, and AUC score = 90.16%), and the worst performance is obtained at γ = 0.93 (precision = 24.86%, mean rank ratio = 13.34%, and AUC score = 87.46%). From the results, we can see that the proposed approach is quite robust when β in the shortest path with Gaussian kernel is greater than 1 or when γ in the diffusion kernel is smaller than 0.15.

Bottom Line: Within a given domain-domain interaction network, we make the assumption that similarities of disease phenotypes can be explained using proximities of domains associated with such diseases.The proposed approach effectively ranks susceptible domains among the top of the candidates, and it is robust to the parameters involved.The predicted landscape provides a comprehensive understanding of associations between domains and human diseases.

View Article: PubMed Central - HTML - PubMed

Affiliation: MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing, China.

ABSTRACT

Background: Domains are basic units of proteins, and thus exploring associations between protein domains and human inherited diseases will greatly improve our understanding of the pathogenesis of human complex diseases and further benefit the medical prevention, diagnosis and treatment of these diseases. Within a given domain-domain interaction network, we make the assumption that similarities of disease phenotypes can be explained using proximities of domains associated with such diseases. Based on this assumption, we propose a Bayesian regression approach named "domainRBF" (domain Rank with Bayes Factor) to prioritize candidate domains for human complex diseases.

Results: Using a compiled dataset containing 1,614 associations between 671 domains and 1,145 disease phenotypes, we demonstrate the effectiveness of the proposed approach through three large-scale leave-one-out cross-validation experiments (random control, simulated linkage interval, and genome-wide scan), and we do so in terms of three criteria (precision, mean rank ratio, and AUC score). We further show that the proposed approach is robust to the parameters involved and the underlying domain-domain interaction network through a series of permutation tests. Once having assessed the validity of this approach, we show the possibility of ab initio inference of domain-disease associations and gene-disease associations, and we illustrate the strong agreement between our inferences and the evidences from genome-wide association studies for four common diseases (type 1 diabetes, type 2 diabetes, Crohn's disease, and breast cancer). Finally, we provide a pre-calculated genome-wide landscape of associations between 5,490 protein domains and 5,080 human diseases and offer free access to this resource.

Conclusions: The proposed approach effectively ranks susceptible domains among the top of the candidates, and it is robust to the parameters involved. The ab initio inference of domain-disease associations shows strong agreement with the evidence provided by genome-wide association studies. The predicted landscape provides a comprehensive understanding of associations between domains and human diseases.

Show MeSH
Related in: MedlinePlus