Limits...
DomainRBF: a Bayesian regression approach to the prioritization of candidate domains for complex diseases.

Zhang W, Chen Y, Sun F, Jiang R - BMC Syst Biol (2011)

Bottom Line: Within a given domain-domain interaction network, we make the assumption that similarities of disease phenotypes can be explained using proximities of domains associated with such diseases.The proposed approach effectively ranks susceptible domains among the top of the candidates, and it is robust to the parameters involved.The predicted landscape provides a comprehensive understanding of associations between domains and human diseases.

View Article: PubMed Central - HTML - PubMed

Affiliation: MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing, China.

ABSTRACT

Background: Domains are basic units of proteins, and thus exploring associations between protein domains and human inherited diseases will greatly improve our understanding of the pathogenesis of human complex diseases and further benefit the medical prevention, diagnosis and treatment of these diseases. Within a given domain-domain interaction network, we make the assumption that similarities of disease phenotypes can be explained using proximities of domains associated with such diseases. Based on this assumption, we propose a Bayesian regression approach named "domainRBF" (domain Rank with Bayes Factor) to prioritize candidate domains for human complex diseases.

Results: Using a compiled dataset containing 1,614 associations between 671 domains and 1,145 disease phenotypes, we demonstrate the effectiveness of the proposed approach through three large-scale leave-one-out cross-validation experiments (random control, simulated linkage interval, and genome-wide scan), and we do so in terms of three criteria (precision, mean rank ratio, and AUC score). We further show that the proposed approach is robust to the parameters involved and the underlying domain-domain interaction network through a series of permutation tests. Once having assessed the validity of this approach, we show the possibility of ab initio inference of domain-disease associations and gene-disease associations, and we illustrate the strong agreement between our inferences and the evidences from genome-wide association studies for four common diseases (type 1 diabetes, type 2 diabetes, Crohn's disease, and breast cancer). Finally, we provide a pre-calculated genome-wide landscape of associations between 5,490 protein domains and 5,080 human diseases and offer free access to this resource.

Conclusions: The proposed approach effectively ranks susceptible domains among the top of the candidates, and it is robust to the parameters involved. The ab initio inference of domain-disease associations shows strong agreement with the evidence provided by genome-wide association studies. The predicted landscape provides a comprehensive understanding of associations between domains and human diseases.

Show MeSH

Related in: MedlinePlus

Modular organization of the predicted landscape of human disease phenotypes. (A) Two-way hierarchical clustering heat map for the landscape of domain-phenotype associations. (B) Zoomed-in plot of the pink circle region in the heat map, involving 17 muscular diseases and 20 related protein domains.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3108930&req=5

Figure 8: Modular organization of the predicted landscape of human disease phenotypes. (A) Two-way hierarchical clustering heat map for the landscape of domain-phenotype associations. (B) Zoomed-in plot of the pink circle region in the heat map, involving 17 muscular diseases and 20 related protein domains.

Mentions: On the basis of the above prioritization results, we aggregate the Bayes factors between all the 5,490 domains and 1,145 phenotypes, and obtain a matrix of altogether 6,286,050 elements. Here we first make a log (base 10) transform of original matrix, and then implement clustering while removing the rows in which the values are all smaller than 0.1. Since phenotypes clustered together generally have similar molecular basis, or share significant genetic overlaps [32], we implement a two-way hierarchical clustering [72], to identify interesting areas where large values of Bayes factors are highly enriched. The clustering result is demonstrated in the form of a heat map, as shown in Figure 8(A). We then manually inspect and annotate each of the phenotype clusters with one of the 22 disorder classes based on the physiological system affected [73]. Through clustering, many highly scored blocks or regions are formed in the heat map, each of which represents a set of functionally related domains implicated in a set of genetically overlapping phenotypes [32]. Specifically, we take the region in the pink circle as an example, which is enlarged in Figure 8(B). Phenotypes in the region selected are enriched with diseases related to the muscle system, and domains are also conjectured to share similar functions with adjacent domains in the same region.


DomainRBF: a Bayesian regression approach to the prioritization of candidate domains for complex diseases.

Zhang W, Chen Y, Sun F, Jiang R - BMC Syst Biol (2011)

Modular organization of the predicted landscape of human disease phenotypes. (A) Two-way hierarchical clustering heat map for the landscape of domain-phenotype associations. (B) Zoomed-in plot of the pink circle region in the heat map, involving 17 muscular diseases and 20 related protein domains.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3108930&req=5

Figure 8: Modular organization of the predicted landscape of human disease phenotypes. (A) Two-way hierarchical clustering heat map for the landscape of domain-phenotype associations. (B) Zoomed-in plot of the pink circle region in the heat map, involving 17 muscular diseases and 20 related protein domains.
Mentions: On the basis of the above prioritization results, we aggregate the Bayes factors between all the 5,490 domains and 1,145 phenotypes, and obtain a matrix of altogether 6,286,050 elements. Here we first make a log (base 10) transform of original matrix, and then implement clustering while removing the rows in which the values are all smaller than 0.1. Since phenotypes clustered together generally have similar molecular basis, or share significant genetic overlaps [32], we implement a two-way hierarchical clustering [72], to identify interesting areas where large values of Bayes factors are highly enriched. The clustering result is demonstrated in the form of a heat map, as shown in Figure 8(A). We then manually inspect and annotate each of the phenotype clusters with one of the 22 disorder classes based on the physiological system affected [73]. Through clustering, many highly scored blocks or regions are formed in the heat map, each of which represents a set of functionally related domains implicated in a set of genetically overlapping phenotypes [32]. Specifically, we take the region in the pink circle as an example, which is enlarged in Figure 8(B). Phenotypes in the region selected are enriched with diseases related to the muscle system, and domains are also conjectured to share similar functions with adjacent domains in the same region.

Bottom Line: Within a given domain-domain interaction network, we make the assumption that similarities of disease phenotypes can be explained using proximities of domains associated with such diseases.The proposed approach effectively ranks susceptible domains among the top of the candidates, and it is robust to the parameters involved.The predicted landscape provides a comprehensive understanding of associations between domains and human diseases.

View Article: PubMed Central - HTML - PubMed

Affiliation: MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing, China.

ABSTRACT

Background: Domains are basic units of proteins, and thus exploring associations between protein domains and human inherited diseases will greatly improve our understanding of the pathogenesis of human complex diseases and further benefit the medical prevention, diagnosis and treatment of these diseases. Within a given domain-domain interaction network, we make the assumption that similarities of disease phenotypes can be explained using proximities of domains associated with such diseases. Based on this assumption, we propose a Bayesian regression approach named "domainRBF" (domain Rank with Bayes Factor) to prioritize candidate domains for human complex diseases.

Results: Using a compiled dataset containing 1,614 associations between 671 domains and 1,145 disease phenotypes, we demonstrate the effectiveness of the proposed approach through three large-scale leave-one-out cross-validation experiments (random control, simulated linkage interval, and genome-wide scan), and we do so in terms of three criteria (precision, mean rank ratio, and AUC score). We further show that the proposed approach is robust to the parameters involved and the underlying domain-domain interaction network through a series of permutation tests. Once having assessed the validity of this approach, we show the possibility of ab initio inference of domain-disease associations and gene-disease associations, and we illustrate the strong agreement between our inferences and the evidences from genome-wide association studies for four common diseases (type 1 diabetes, type 2 diabetes, Crohn's disease, and breast cancer). Finally, we provide a pre-calculated genome-wide landscape of associations between 5,490 protein domains and 5,080 human diseases and offer free access to this resource.

Conclusions: The proposed approach effectively ranks susceptible domains among the top of the candidates, and it is robust to the parameters involved. The ab initio inference of domain-disease associations shows strong agreement with the evidence provided by genome-wide association studies. The predicted landscape provides a comprehensive understanding of associations between domains and human diseases.

Show MeSH
Related in: MedlinePlus