Limits...
POCUS: mining genomic sequence annotation to predict disease genes.

Turner FS, Clutterbuck DR, Semple CA - Genome Biol. (2003)

Bottom Line: Here we present POCUS (prioritization of candidate genes using statistics), a novel computational approach to prioritize candidate disease genes that is based on over-representation of functional annotation between loci for the same disease.We show that POCUS can provide high (up to 81-fold) enrichment of real disease genes in the candidate-gene shortlists it produces compared with the original large sets of positional candidates.In contrast to existing methods, POCUS can also suggest counterintuitive candidates.

View Article: PubMed Central - HTML - PubMed

Affiliation: MRC Human Genetics Unit, Crewe Road, Western General Hospital, Edinburgh EH4 2XU, UK.

ABSTRACT
Here we present POCUS (prioritization of candidate genes using statistics), a novel computational approach to prioritize candidate disease genes that is based on over-representation of functional annotation between loci for the same disease. We show that POCUS can provide high (up to 81-fold) enrichment of real disease genes in the candidate-gene shortlists it produces compared with the original large sets of positional candidates. In contrast to existing methods, POCUS can also suggest counterintuitive candidates.

Show MeSH
POCUS results per locus for positive control sets of disease genes: the percentage of loci for each of three outcomes is plotted against locus size (100, 500, and 1,000 IDs) at two threshold scores (0.5 and 0.8). The outcome 'No genes exceed threshold' corresponds to the rate of false negatives, 'Only non-disease genes exceed threshold' corresponds to the rate of false positives, and 'Disease gene exceeds threshold' corresponds to the rate of true positives.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC329128&req=5

Figure 1: POCUS results per locus for positive control sets of disease genes: the percentage of loci for each of three outcomes is plotted against locus size (100, 500, and 1,000 IDs) at two threshold scores (0.5 and 0.8). The outcome 'No genes exceed threshold' corresponds to the rate of false negatives, 'Only non-disease genes exceed threshold' corresponds to the rate of false positives, and 'Disease gene exceeds threshold' corresponds to the rate of true positives.

Mentions: For any given locus three outcomes were possible, POCUS may have returned the disease gene (and often others) above the threshold, it may have returned only non-disease genes scoring above the threshold, or it may have returned no genes above the threshold. Figure 1 depicts the rates of each possible outcome per locus. It shows that in 49-75% of loci (depending on locus size), no genes scored above the threshold of 0.8, that is, POCUS was unable to detect any candidates but equally did not return any non-disease genes either. Correspondingly, in 6-15% of loci, only non-disease genes exceeded this threshold, and in the remainder of loci the disease gene was correctly identified (45%, 15% and 11% respectively at the three locus sizes). As Figure 1 shows, compared with 0.5, the more stringent threshold of 0.8, while resulting in a small loss of true positives (correctly identified disease genes), more efficiently reduced the number of false positives (non-disease genes) returned as candidates by POCUS. At the 0.8 threshold, the relative enrichment for disease genes within those genes above the threshold was 12-fold (95% confidence intervals (CI): 9.74-15.83), 29-fold (95% CI: 18.79-43.24) and 42-fold (95% CI: 25.36-69.45), respectively, at the three locus sizes. This means, for example, that any gene from a locus 1,000 IDs in size was 42 times more likely to be the disease gene if it was picked from those genes above the threshold than if it was chosen at random from the locus.


POCUS: mining genomic sequence annotation to predict disease genes.

Turner FS, Clutterbuck DR, Semple CA - Genome Biol. (2003)

POCUS results per locus for positive control sets of disease genes: the percentage of loci for each of three outcomes is plotted against locus size (100, 500, and 1,000 IDs) at two threshold scores (0.5 and 0.8). The outcome 'No genes exceed threshold' corresponds to the rate of false negatives, 'Only non-disease genes exceed threshold' corresponds to the rate of false positives, and 'Disease gene exceeds threshold' corresponds to the rate of true positives.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC329128&req=5

Figure 1: POCUS results per locus for positive control sets of disease genes: the percentage of loci for each of three outcomes is plotted against locus size (100, 500, and 1,000 IDs) at two threshold scores (0.5 and 0.8). The outcome 'No genes exceed threshold' corresponds to the rate of false negatives, 'Only non-disease genes exceed threshold' corresponds to the rate of false positives, and 'Disease gene exceeds threshold' corresponds to the rate of true positives.
Mentions: For any given locus three outcomes were possible, POCUS may have returned the disease gene (and often others) above the threshold, it may have returned only non-disease genes scoring above the threshold, or it may have returned no genes above the threshold. Figure 1 depicts the rates of each possible outcome per locus. It shows that in 49-75% of loci (depending on locus size), no genes scored above the threshold of 0.8, that is, POCUS was unable to detect any candidates but equally did not return any non-disease genes either. Correspondingly, in 6-15% of loci, only non-disease genes exceeded this threshold, and in the remainder of loci the disease gene was correctly identified (45%, 15% and 11% respectively at the three locus sizes). As Figure 1 shows, compared with 0.5, the more stringent threshold of 0.8, while resulting in a small loss of true positives (correctly identified disease genes), more efficiently reduced the number of false positives (non-disease genes) returned as candidates by POCUS. At the 0.8 threshold, the relative enrichment for disease genes within those genes above the threshold was 12-fold (95% confidence intervals (CI): 9.74-15.83), 29-fold (95% CI: 18.79-43.24) and 42-fold (95% CI: 25.36-69.45), respectively, at the three locus sizes. This means, for example, that any gene from a locus 1,000 IDs in size was 42 times more likely to be the disease gene if it was picked from those genes above the threshold than if it was chosen at random from the locus.

Bottom Line: Here we present POCUS (prioritization of candidate genes using statistics), a novel computational approach to prioritize candidate disease genes that is based on over-representation of functional annotation between loci for the same disease.We show that POCUS can provide high (up to 81-fold) enrichment of real disease genes in the candidate-gene shortlists it produces compared with the original large sets of positional candidates.In contrast to existing methods, POCUS can also suggest counterintuitive candidates.

View Article: PubMed Central - HTML - PubMed

Affiliation: MRC Human Genetics Unit, Crewe Road, Western General Hospital, Edinburgh EH4 2XU, UK.

ABSTRACT
Here we present POCUS (prioritization of candidate genes using statistics), a novel computational approach to prioritize candidate disease genes that is based on over-representation of functional annotation between loci for the same disease. We show that POCUS can provide high (up to 81-fold) enrichment of real disease genes in the candidate-gene shortlists it produces compared with the original large sets of positional candidates. In contrast to existing methods, POCUS can also suggest counterintuitive candidates.

Show MeSH