Limits...
Predicting Type 1 Diabetes Candidate Genes using Human Protein-Protein Interaction Networks.

Gao S, Wang X - J Comput Sci Syst Biol (2009)

Bottom Line: We find that the citations of the new candidates in T1D-related publications are significantly (p<1e-7) more than random, even after excluding the co-citation with the known disease genes; they are significantly over-represented (p<1e-10) in the top 30 GO terms shared by known disease genes.Furthermore, sequence analysis reveals that they contain significantly (p<0.0004) more protein domains that are known to be relevant to T1D.CONCLUSION: Our study demonstrates the potential of the PPI information in prioritizing positional candidate genes for T1D.

View Article: PubMed Central - PubMed

Affiliation: Department of Physics & the Comprehensive Diabetes Center, University of Alabama at Birmingham, 1300 University Blvd, Birmingham, AL 35294, USA.

ABSTRACT
BACKGROUND: Proteins directly interacting with each other tend to have similar functions and be involved in the same cellular processes. Mutations in genes that code for them often lead to the same family of disease phenotypes. Efforts have been made to prioritize positional candidate genes for complex diseases utilize the protein-protein interaction (PPI) information. But such an approach is often considered too general to be practically useful for specific diseases. RESULTS: In this study we investigate the efficacy of this approach in type 1 diabetes (T1D). 266 known disease genes, and 983 positional candidate genes from the 18 established linkage loci of T1D, are compiled from the T1Dbase (http://t1dbase.org). We found that the PPI network of known T1D genes has distinct topological features from others, with significantly higher number of interactions among themselves even after adjusting for their high network degrees (p<1e-5). We then define those positional candidates that are first degree PPI neighbours of the 266 known disease genes to be new candidate disease genes. This leads to a list of 68 genes for further study. Cross validation using the known disease genes as benchmark reveals that the enrichment is ~17.1 fold over random selection, and ~4 fold better than using the linkage information alone. We find that the citations of the new candidates in T1D-related publications are significantly (p<1e-7) more than random, even after excluding the co-citation with the known disease genes; they are significantly over-represented (p<1e-10) in the top 30 GO terms shared by known disease genes. Furthermore, sequence analysis reveals that they contain significantly (p<0.0004) more protein domains that are known to be relevant to T1D. These findings provide indirect validation of the newly predicted candidates. CONCLUSION: Our study demonstrates the potential of the PPI information in prioritizing positional candidate genes for T1D.

No MeSH data available.


Related in: MedlinePlus

The size effect of the bait set. (A) Number of predicted disease genes increases with number of baits. (B) The efficiency of the disease gene prediction algorithm, as judged by the odds ratio of known disease gene being recovered, does not depend on the size of bait set.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2818071&req=5

Figure 3: The size effect of the bait set. (A) Number of predicted disease genes increases with number of baits. (B) The efficiency of the disease gene prediction algorithm, as judged by the odds ratio of known disease gene being recovered, does not depend on the size of bait set.

Mentions: We first evaluated the performance of the disease gene prediction algorithm using the known T1D genes as bench marks. In more detail, each time we randomly select f fraction of known T1D genes as baits, and tested how many of the remaining 1-f fraction were predicted. We tested for 6 different f values: 1/5, 1/3, 1/2, 2/3, 4/5 and 1, and for each f value (except f=1, which was only used to calculate the number of predicted genes, but not for cross validation as no testing set) we repeated 20 times. Figure 3 summarizes the results. Evidently the number of predicted genes increases with the number of baits (figure 3A). Interestingly, the trend seems to slow down as the bait number increases. This could be due to the limitations of our current knowledge of PPI (incompleteness and quality issues, for example), it may also suggest that total number of T1D disease genes is limited. Further investigation of this phenomenon is needed when we have a better understanding of PPI and T1D disease biology. The efficiency to recover the known disease genes, defined as the odds of disease gene enrichment in predicted candidates over random, seems to be affected little by the number of baits, as shown in figure 3B. The high enrichment ratios, at ~17.1 (14.1-18.6) fold suggest that our baiting algorithm can recover the known disease genes well.


Predicting Type 1 Diabetes Candidate Genes using Human Protein-Protein Interaction Networks.

Gao S, Wang X - J Comput Sci Syst Biol (2009)

The size effect of the bait set. (A) Number of predicted disease genes increases with number of baits. (B) The efficiency of the disease gene prediction algorithm, as judged by the odds ratio of known disease gene being recovered, does not depend on the size of bait set.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2818071&req=5

Figure 3: The size effect of the bait set. (A) Number of predicted disease genes increases with number of baits. (B) The efficiency of the disease gene prediction algorithm, as judged by the odds ratio of known disease gene being recovered, does not depend on the size of bait set.
Mentions: We first evaluated the performance of the disease gene prediction algorithm using the known T1D genes as bench marks. In more detail, each time we randomly select f fraction of known T1D genes as baits, and tested how many of the remaining 1-f fraction were predicted. We tested for 6 different f values: 1/5, 1/3, 1/2, 2/3, 4/5 and 1, and for each f value (except f=1, which was only used to calculate the number of predicted genes, but not for cross validation as no testing set) we repeated 20 times. Figure 3 summarizes the results. Evidently the number of predicted genes increases with the number of baits (figure 3A). Interestingly, the trend seems to slow down as the bait number increases. This could be due to the limitations of our current knowledge of PPI (incompleteness and quality issues, for example), it may also suggest that total number of T1D disease genes is limited. Further investigation of this phenomenon is needed when we have a better understanding of PPI and T1D disease biology. The efficiency to recover the known disease genes, defined as the odds of disease gene enrichment in predicted candidates over random, seems to be affected little by the number of baits, as shown in figure 3B. The high enrichment ratios, at ~17.1 (14.1-18.6) fold suggest that our baiting algorithm can recover the known disease genes well.

Bottom Line: We find that the citations of the new candidates in T1D-related publications are significantly (p<1e-7) more than random, even after excluding the co-citation with the known disease genes; they are significantly over-represented (p<1e-10) in the top 30 GO terms shared by known disease genes.Furthermore, sequence analysis reveals that they contain significantly (p<0.0004) more protein domains that are known to be relevant to T1D.CONCLUSION: Our study demonstrates the potential of the PPI information in prioritizing positional candidate genes for T1D.

View Article: PubMed Central - PubMed

Affiliation: Department of Physics & the Comprehensive Diabetes Center, University of Alabama at Birmingham, 1300 University Blvd, Birmingham, AL 35294, USA.

ABSTRACT
BACKGROUND: Proteins directly interacting with each other tend to have similar functions and be involved in the same cellular processes. Mutations in genes that code for them often lead to the same family of disease phenotypes. Efforts have been made to prioritize positional candidate genes for complex diseases utilize the protein-protein interaction (PPI) information. But such an approach is often considered too general to be practically useful for specific diseases. RESULTS: In this study we investigate the efficacy of this approach in type 1 diabetes (T1D). 266 known disease genes, and 983 positional candidate genes from the 18 established linkage loci of T1D, are compiled from the T1Dbase (http://t1dbase.org). We found that the PPI network of known T1D genes has distinct topological features from others, with significantly higher number of interactions among themselves even after adjusting for their high network degrees (p<1e-5). We then define those positional candidates that are first degree PPI neighbours of the 266 known disease genes to be new candidate disease genes. This leads to a list of 68 genes for further study. Cross validation using the known disease genes as benchmark reveals that the enrichment is ~17.1 fold over random selection, and ~4 fold better than using the linkage information alone. We find that the citations of the new candidates in T1D-related publications are significantly (p<1e-7) more than random, even after excluding the co-citation with the known disease genes; they are significantly over-represented (p<1e-10) in the top 30 GO terms shared by known disease genes. Furthermore, sequence analysis reveals that they contain significantly (p<0.0004) more protein domains that are known to be relevant to T1D. These findings provide indirect validation of the newly predicted candidates. CONCLUSION: Our study demonstrates the potential of the PPI information in prioritizing positional candidate genes for T1D.

No MeSH data available.


Related in: MedlinePlus