Limits...
Analysis of protein sequence and interaction data for candidate disease gene prediction.

George RA, Liu JY, Feng LL, Bryson-Richardson RJ, Fatkin D, Wouters MA - Nucleic Acids Res. (2006)

Bottom Line: Linkage analysis is a successful procedure to associate diseases with specific genomic regions.When using known disease genes as input, our combined methods have a sensitivity of 0.52 and a specificity of 0.97 and reduce the candidate list by 13-fold.Our combined approach prioritizes good candidates and will accelerate the disease gene discovery process.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology & Bioinformatics Program, Sydney, NSW, Australia.

ABSTRACT
Linkage analysis is a successful procedure to associate diseases with specific genomic regions. These regions are often large, containing hundreds of genes, which make experimental methods employed to identify the disease gene arduous and expensive. We present two methods to prioritize candidates for further experimental study: Common Pathway Scanning (CPS) and Common Module Profiling (CMP). CPS is based on the assumption that common phenotypes are associated with dysfunction in proteins that participate in the same complex or pathway. CPS applies network data derived from protein-protein interaction (PPI) and pathway databases to identify relationships between genes. CMP identifies likely candidates using a domain-dependent sequence similarity approach, based on the hypothesis that disruption of genes of similar function will lead to the same phenotype. Both algorithms use two forms of input data: known disease genes or multiple disease loci. When using known disease genes as input, our combined methods have a sensitivity of 0.52 and a specificity of 0.97 and reduce the candidate list by 13-fold. Using multiple loci, our methods successfully identify disease genes for all benchmark diseases with a sensitivity of 0.84 and a specificity of 0.63. Our combined approach prioritizes good candidates and will accelerate the disease gene discovery process.

Show MeSH
Performance of PPI data from (a) OPHID, (b) OPHIDh, (c) OPHIDlit+ and (d) OPHIDlit−. Results are shown for three levels of interaction using the shortest path length to a disease gene (Distance). Black diamonds represent the number of disease genes found. The number of non-disease genes returned are presented for the 50 gene interval (square), 100 gene interval (triangle) and 150 gene interval (x). The number of disease genes returned by random selection are presented for the 50 gene interval (*), 100 gene interval (circle) and 150 gene interval (+).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC1636487&req=5

fig2: Performance of PPI data from (a) OPHID, (b) OPHIDh, (c) OPHIDlit+ and (d) OPHIDlit−. Results are shown for three levels of interaction using the shortest path length to a disease gene (Distance). Black diamonds represent the number of disease genes found. The number of non-disease genes returned are presented for the 50 gene interval (square), 100 gene interval (triangle) and 150 gene interval (x). The number of disease genes returned by random selection are presented for the 50 gene interval (*), 100 gene interval (circle) and 150 gene interval (+).

Mentions: Figure 2 shows the number of false positives returned by the interaction data at increasing path lengths up to a distance of three interactions from the known disease genes. As the shortest path length increases, the sensitivity improves, but the number of false positives increases exponentially and reduces the specificity. At a distance of two interactions, the full OPHID set finds 84 disease genes with a sensitivity of 0.49, a specificity of 0.96 and an enrichment of 11-fold. Increasing the distance to three interactions, finds 123 disease genes, with a high sensitivity of 0.72, but a smaller specificity of 0.82 and a poor 4-fold enrichment.


Analysis of protein sequence and interaction data for candidate disease gene prediction.

George RA, Liu JY, Feng LL, Bryson-Richardson RJ, Fatkin D, Wouters MA - Nucleic Acids Res. (2006)

Performance of PPI data from (a) OPHID, (b) OPHIDh, (c) OPHIDlit+ and (d) OPHIDlit−. Results are shown for three levels of interaction using the shortest path length to a disease gene (Distance). Black diamonds represent the number of disease genes found. The number of non-disease genes returned are presented for the 50 gene interval (square), 100 gene interval (triangle) and 150 gene interval (x). The number of disease genes returned by random selection are presented for the 50 gene interval (*), 100 gene interval (circle) and 150 gene interval (+).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC1636487&req=5

fig2: Performance of PPI data from (a) OPHID, (b) OPHIDh, (c) OPHIDlit+ and (d) OPHIDlit−. Results are shown for three levels of interaction using the shortest path length to a disease gene (Distance). Black diamonds represent the number of disease genes found. The number of non-disease genes returned are presented for the 50 gene interval (square), 100 gene interval (triangle) and 150 gene interval (x). The number of disease genes returned by random selection are presented for the 50 gene interval (*), 100 gene interval (circle) and 150 gene interval (+).
Mentions: Figure 2 shows the number of false positives returned by the interaction data at increasing path lengths up to a distance of three interactions from the known disease genes. As the shortest path length increases, the sensitivity improves, but the number of false positives increases exponentially and reduces the specificity. At a distance of two interactions, the full OPHID set finds 84 disease genes with a sensitivity of 0.49, a specificity of 0.96 and an enrichment of 11-fold. Increasing the distance to three interactions, finds 123 disease genes, with a high sensitivity of 0.72, but a smaller specificity of 0.82 and a poor 4-fold enrichment.

Bottom Line: Linkage analysis is a successful procedure to associate diseases with specific genomic regions.When using known disease genes as input, our combined methods have a sensitivity of 0.52 and a specificity of 0.97 and reduce the candidate list by 13-fold.Our combined approach prioritizes good candidates and will accelerate the disease gene discovery process.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology & Bioinformatics Program, Sydney, NSW, Australia.

ABSTRACT
Linkage analysis is a successful procedure to associate diseases with specific genomic regions. These regions are often large, containing hundreds of genes, which make experimental methods employed to identify the disease gene arduous and expensive. We present two methods to prioritize candidates for further experimental study: Common Pathway Scanning (CPS) and Common Module Profiling (CMP). CPS is based on the assumption that common phenotypes are associated with dysfunction in proteins that participate in the same complex or pathway. CPS applies network data derived from protein-protein interaction (PPI) and pathway databases to identify relationships between genes. CMP identifies likely candidates using a domain-dependent sequence similarity approach, based on the hypothesis that disruption of genes of similar function will lead to the same phenotype. Both algorithms use two forms of input data: known disease genes or multiple disease loci. When using known disease genes as input, our combined methods have a sensitivity of 0.52 and a specificity of 0.97 and reduce the candidate list by 13-fold. Using multiple loci, our methods successfully identify disease genes for all benchmark diseases with a sensitivity of 0.84 and a specificity of 0.63. Our combined approach prioritizes good candidates and will accelerate the disease gene discovery process.

Show MeSH