Limits...
DomainRBF: a Bayesian regression approach to the prioritization of candidate domains for complex diseases.

Zhang W, Chen Y, Sun F, Jiang R - BMC Syst Biol (2011)

Bottom Line: Within a given domain-domain interaction network, we make the assumption that similarities of disease phenotypes can be explained using proximities of domains associated with such diseases.The proposed approach effectively ranks susceptible domains among the top of the candidates, and it is robust to the parameters involved.The predicted landscape provides a comprehensive understanding of associations between domains and human diseases.

View Article: PubMed Central - HTML - PubMed

Affiliation: MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing, China.

ABSTRACT

Background: Domains are basic units of proteins, and thus exploring associations between protein domains and human inherited diseases will greatly improve our understanding of the pathogenesis of human complex diseases and further benefit the medical prevention, diagnosis and treatment of these diseases. Within a given domain-domain interaction network, we make the assumption that similarities of disease phenotypes can be explained using proximities of domains associated with such diseases. Based on this assumption, we propose a Bayesian regression approach named "domainRBF" (domain Rank with Bayes Factor) to prioritize candidate domains for human complex diseases.

Results: Using a compiled dataset containing 1,614 associations between 671 domains and 1,145 disease phenotypes, we demonstrate the effectiveness of the proposed approach through three large-scale leave-one-out cross-validation experiments (random control, simulated linkage interval, and genome-wide scan), and we do so in terms of three criteria (precision, mean rank ratio, and AUC score). We further show that the proposed approach is robust to the parameters involved and the underlying domain-domain interaction network through a series of permutation tests. Once having assessed the validity of this approach, we show the possibility of ab initio inference of domain-disease associations and gene-disease associations, and we illustrate the strong agreement between our inferences and the evidences from genome-wide association studies for four common diseases (type 1 diabetes, type 2 diabetes, Crohn's disease, and breast cancer). Finally, we provide a pre-calculated genome-wide landscape of associations between 5,490 protein domains and 5,080 human diseases and offer free access to this resource.

Conclusions: The proposed approach effectively ranks susceptible domains among the top of the candidates, and it is robust to the parameters involved. The ab initio inference of domain-disease associations shows strong agreement with the evidence provided by genome-wide association studies. The predicted landscape provides a comprehensive understanding of associations between domains and human diseases.

Show MeSH

Related in: MedlinePlus

ROC curves of the leave-one-out cross-validation experiments on shuffled data. (A) Results for random controls with domain-domain interactions shuffled. (B) Results for linkage intervals with domain-domain interactions shuffled. (C) Results for genome-wide scan with domain-domain interactions shuffled. (D) Results for random controls with known domain-phenotype associations shuffled. (E) Results for linkage intervals with known domain-phenotype associations shuffled. (F) Results for genome-wide scan with known domain-phenotype associations shuffled. (G) Results for random controls with phenotype similarity profiles shuffled. (H) Results for linkage intervals with phenotype similarity profiles shuffled. (I) Results for genome-wide scan with phenotype similarity profiles shuffled. The small domain-domain interaction network composed of the PDB part of the DOMINE database and the diffusion kernel are used to obtain the results.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3108930&req=5

Figure 5: ROC curves of the leave-one-out cross-validation experiments on shuffled data. (A) Results for random controls with domain-domain interactions shuffled. (B) Results for linkage intervals with domain-domain interactions shuffled. (C) Results for genome-wide scan with domain-domain interactions shuffled. (D) Results for random controls with known domain-phenotype associations shuffled. (E) Results for linkage intervals with known domain-phenotype associations shuffled. (F) Results for genome-wide scan with known domain-phenotype associations shuffled. (G) Results for random controls with phenotype similarity profiles shuffled. (H) Results for linkage intervals with phenotype similarity profiles shuffled. (I) Results for genome-wide scan with phenotype similarity profiles shuffled. The small domain-domain interaction network composed of the PDB part of the DOMINE database and the diffusion kernel are used to obtain the results.

Mentions: The above validation results suggest that the domainRBF approach can successfully prioritize candidate domains and put the domain that is truly associated with the query disease phenotype at the top of the candidates. However, it is still necessary to determine whether the correct prioritization of disease domains is due to the connectivity information that includes in the domain-domain interactions, domain-phenotype associations, and phenotype-phenotype similarities. To accomplish this, we artificially destroy informative interactions in the above three networks and see what performances will turn out. It is expected that both the mean rank ratios and the AUC scores will be around 50%, together with very low precisions. With this understanding, we perform three permutation experiments: 1) shuffling interactions among domains while fixing the node degree (number of direct neighbours) distribution of the entire interaction network, 2) shuffling interactions among domain-phenotype associations while fixing the number of associated domains for each of the phenotypes, and 3) shuffling the phenotype-phenotype similarity while fixing the distribution of phenotype similarities, respectively. Then we repeat the leave-one-out cross-validation experiments using the shuffled networks, which contain no informative interactions among domains, among domain and phenotypes, or among phenotypes, respectively. As shown in Figure 5, the results obtained are generally consistent with our expectation in that AUC scores are all around 50%. We therefore conclude that the successful prioritization of candidate domains is indeed due to the informative interactions among domains that are included in the domain-domain interaction network.


DomainRBF: a Bayesian regression approach to the prioritization of candidate domains for complex diseases.

Zhang W, Chen Y, Sun F, Jiang R - BMC Syst Biol (2011)

ROC curves of the leave-one-out cross-validation experiments on shuffled data. (A) Results for random controls with domain-domain interactions shuffled. (B) Results for linkage intervals with domain-domain interactions shuffled. (C) Results for genome-wide scan with domain-domain interactions shuffled. (D) Results for random controls with known domain-phenotype associations shuffled. (E) Results for linkage intervals with known domain-phenotype associations shuffled. (F) Results for genome-wide scan with known domain-phenotype associations shuffled. (G) Results for random controls with phenotype similarity profiles shuffled. (H) Results for linkage intervals with phenotype similarity profiles shuffled. (I) Results for genome-wide scan with phenotype similarity profiles shuffled. The small domain-domain interaction network composed of the PDB part of the DOMINE database and the diffusion kernel are used to obtain the results.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3108930&req=5

Figure 5: ROC curves of the leave-one-out cross-validation experiments on shuffled data. (A) Results for random controls with domain-domain interactions shuffled. (B) Results for linkage intervals with domain-domain interactions shuffled. (C) Results for genome-wide scan with domain-domain interactions shuffled. (D) Results for random controls with known domain-phenotype associations shuffled. (E) Results for linkage intervals with known domain-phenotype associations shuffled. (F) Results for genome-wide scan with known domain-phenotype associations shuffled. (G) Results for random controls with phenotype similarity profiles shuffled. (H) Results for linkage intervals with phenotype similarity profiles shuffled. (I) Results for genome-wide scan with phenotype similarity profiles shuffled. The small domain-domain interaction network composed of the PDB part of the DOMINE database and the diffusion kernel are used to obtain the results.
Mentions: The above validation results suggest that the domainRBF approach can successfully prioritize candidate domains and put the domain that is truly associated with the query disease phenotype at the top of the candidates. However, it is still necessary to determine whether the correct prioritization of disease domains is due to the connectivity information that includes in the domain-domain interactions, domain-phenotype associations, and phenotype-phenotype similarities. To accomplish this, we artificially destroy informative interactions in the above three networks and see what performances will turn out. It is expected that both the mean rank ratios and the AUC scores will be around 50%, together with very low precisions. With this understanding, we perform three permutation experiments: 1) shuffling interactions among domains while fixing the node degree (number of direct neighbours) distribution of the entire interaction network, 2) shuffling interactions among domain-phenotype associations while fixing the number of associated domains for each of the phenotypes, and 3) shuffling the phenotype-phenotype similarity while fixing the distribution of phenotype similarities, respectively. Then we repeat the leave-one-out cross-validation experiments using the shuffled networks, which contain no informative interactions among domains, among domain and phenotypes, or among phenotypes, respectively. As shown in Figure 5, the results obtained are generally consistent with our expectation in that AUC scores are all around 50%. We therefore conclude that the successful prioritization of candidate domains is indeed due to the informative interactions among domains that are included in the domain-domain interaction network.

Bottom Line: Within a given domain-domain interaction network, we make the assumption that similarities of disease phenotypes can be explained using proximities of domains associated with such diseases.The proposed approach effectively ranks susceptible domains among the top of the candidates, and it is robust to the parameters involved.The predicted landscape provides a comprehensive understanding of associations between domains and human diseases.

View Article: PubMed Central - HTML - PubMed

Affiliation: MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing, China.

ABSTRACT

Background: Domains are basic units of proteins, and thus exploring associations between protein domains and human inherited diseases will greatly improve our understanding of the pathogenesis of human complex diseases and further benefit the medical prevention, diagnosis and treatment of these diseases. Within a given domain-domain interaction network, we make the assumption that similarities of disease phenotypes can be explained using proximities of domains associated with such diseases. Based on this assumption, we propose a Bayesian regression approach named "domainRBF" (domain Rank with Bayes Factor) to prioritize candidate domains for human complex diseases.

Results: Using a compiled dataset containing 1,614 associations between 671 domains and 1,145 disease phenotypes, we demonstrate the effectiveness of the proposed approach through three large-scale leave-one-out cross-validation experiments (random control, simulated linkage interval, and genome-wide scan), and we do so in terms of three criteria (precision, mean rank ratio, and AUC score). We further show that the proposed approach is robust to the parameters involved and the underlying domain-domain interaction network through a series of permutation tests. Once having assessed the validity of this approach, we show the possibility of ab initio inference of domain-disease associations and gene-disease associations, and we illustrate the strong agreement between our inferences and the evidences from genome-wide association studies for four common diseases (type 1 diabetes, type 2 diabetes, Crohn's disease, and breast cancer). Finally, we provide a pre-calculated genome-wide landscape of associations between 5,490 protein domains and 5,080 human diseases and offer free access to this resource.

Conclusions: The proposed approach effectively ranks susceptible domains among the top of the candidates, and it is robust to the parameters involved. The ab initio inference of domain-disease associations shows strong agreement with the evidence provided by genome-wide association studies. The predicted landscape provides a comprehensive understanding of associations between domains and human diseases.

Show MeSH
Related in: MedlinePlus