Limits...
Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies.

Edwards BJ, Haynes C, Levenstien MA, Finch SJ, Gordon D - BMC Genet. (2005)

Bottom Line: Phenotype error causes reduction in power to detect genetic association.Our work enables researchers to specifically quantify power loss and minimum sample size requirements in the presence of phenotype errors, thereby allowing for more realistic study design.For most diseases of current interest, verifying that cases are correctly classified is of paramount importance.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Statistical Genetics, Rockefeller University, New York, NY 10021, USA. brian.edwards@yale.edu

ABSTRACT

Background: Phenotype error causes reduction in power to detect genetic association. We present a quantification of phenotype error, also known as diagnostic error, on power and sample size calculations for case-control genetic association studies between a marker locus and a disease phenotype. We consider the classic Pearson chi-square test for independence as our test of genetic association. To determine asymptotic power analytically, we compute the distribution's non-centrality parameter, which is a function of the case and control sample sizes, genotype frequencies, disease prevalence, and phenotype misclassification probabilities. We derive the non-centrality parameter in the presence of phenotype errors and equivalent formulas for misclassification cost (the percentage increase in minimum sample size needed to maintain constant asymptotic power at a fixed significance level for each percentage increase in a given misclassification parameter). We use a linear Taylor Series approximation for the cost of phenotype misclassification to determine lower bounds for the relative costs of misclassifying a true affected (respectively, unaffected) as a control (respectively, case). Power is verified by computer simulation.

Results: Our major findings are that: (i) the median absolute difference between analytic power with our method and simulation power was 0.001 and the absolute difference was no larger than 0.011; (ii) as the disease prevalence approaches 0, the cost of misclassifying a unaffected as a case becomes infinitely large while the cost of misclassifying an affected as a control approaches 0.

Conclusion: Our work enables researchers to specifically quantify power loss and minimum sample size requirements in the presence of phenotype errors, thereby allowing for more realistic study design. For most diseases of current interest, verifying that cases are correctly classified is of paramount importance.

Show MeSH
Power to detect association for two different settings of prevalence when only one phenotype misclassification parameter is non-zero. In this figure, the horizontal axis refers to the misclassification probability for one parameter when the second parameter is 0. For example, the graphs labeled "φ = 0" provide power calculations at two settings of disease prevalence (K = 0.05, K = 0.01) as a function of θ values ranging from 0.0 to 0.15 on the horizontal axis. Similarly, the graphs labeled "θ = 0" provide power calculations at two settings of disease prevalence (K = 0.05, K = 0.01) as a function of φ ranging from 0.0 to 0.15 on the horizontal axis.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1131899&req=5

Figure 2: Power to detect association for two different settings of prevalence when only one phenotype misclassification parameter is non-zero. In this figure, the horizontal axis refers to the misclassification probability for one parameter when the second parameter is 0. For example, the graphs labeled "φ = 0" provide power calculations at two settings of disease prevalence (K = 0.05, K = 0.01) as a function of θ values ranging from 0.0 to 0.15 on the horizontal axis. Similarly, the graphs labeled "θ = 0" provide power calculations at two settings of disease prevalence (K = 0.05, K = 0.01) as a function of φ ranging from 0.0 to 0.15 on the horizontal axis.

Mentions: Another way of interpreting cost is by considering the power loss for fixed sample size. We demonstrate this point in figure 2. In that figure, we present the power in the presence of phenotype misclassification when either the θ or φ parameter is set to 0 and the other parameter ranges from 0 to 0.15 in increments of 0.01. Power is calculated at the 1% significance level assuming 250 cases and 250 controls, a SNP locus with case minor allele frequency 0.05, control minor allele frequency 0.15 (Hardy Weinberg equilibrium in both populations), and two settings of disease prevalence (K = 0.05, 0.01). Power is determined through calculation of the non-centrality parameter (equation (2)).


Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies.

Edwards BJ, Haynes C, Levenstien MA, Finch SJ, Gordon D - BMC Genet. (2005)

Power to detect association for two different settings of prevalence when only one phenotype misclassification parameter is non-zero. In this figure, the horizontal axis refers to the misclassification probability for one parameter when the second parameter is 0. For example, the graphs labeled "φ = 0" provide power calculations at two settings of disease prevalence (K = 0.05, K = 0.01) as a function of θ values ranging from 0.0 to 0.15 on the horizontal axis. Similarly, the graphs labeled "θ = 0" provide power calculations at two settings of disease prevalence (K = 0.05, K = 0.01) as a function of φ ranging from 0.0 to 0.15 on the horizontal axis.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1131899&req=5

Figure 2: Power to detect association for two different settings of prevalence when only one phenotype misclassification parameter is non-zero. In this figure, the horizontal axis refers to the misclassification probability for one parameter when the second parameter is 0. For example, the graphs labeled "φ = 0" provide power calculations at two settings of disease prevalence (K = 0.05, K = 0.01) as a function of θ values ranging from 0.0 to 0.15 on the horizontal axis. Similarly, the graphs labeled "θ = 0" provide power calculations at two settings of disease prevalence (K = 0.05, K = 0.01) as a function of φ ranging from 0.0 to 0.15 on the horizontal axis.
Mentions: Another way of interpreting cost is by considering the power loss for fixed sample size. We demonstrate this point in figure 2. In that figure, we present the power in the presence of phenotype misclassification when either the θ or φ parameter is set to 0 and the other parameter ranges from 0 to 0.15 in increments of 0.01. Power is calculated at the 1% significance level assuming 250 cases and 250 controls, a SNP locus with case minor allele frequency 0.05, control minor allele frequency 0.15 (Hardy Weinberg equilibrium in both populations), and two settings of disease prevalence (K = 0.05, 0.01). Power is determined through calculation of the non-centrality parameter (equation (2)).

Bottom Line: Phenotype error causes reduction in power to detect genetic association.Our work enables researchers to specifically quantify power loss and minimum sample size requirements in the presence of phenotype errors, thereby allowing for more realistic study design.For most diseases of current interest, verifying that cases are correctly classified is of paramount importance.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Statistical Genetics, Rockefeller University, New York, NY 10021, USA. brian.edwards@yale.edu

ABSTRACT

Background: Phenotype error causes reduction in power to detect genetic association. We present a quantification of phenotype error, also known as diagnostic error, on power and sample size calculations for case-control genetic association studies between a marker locus and a disease phenotype. We consider the classic Pearson chi-square test for independence as our test of genetic association. To determine asymptotic power analytically, we compute the distribution's non-centrality parameter, which is a function of the case and control sample sizes, genotype frequencies, disease prevalence, and phenotype misclassification probabilities. We derive the non-centrality parameter in the presence of phenotype errors and equivalent formulas for misclassification cost (the percentage increase in minimum sample size needed to maintain constant asymptotic power at a fixed significance level for each percentage increase in a given misclassification parameter). We use a linear Taylor Series approximation for the cost of phenotype misclassification to determine lower bounds for the relative costs of misclassifying a true affected (respectively, unaffected) as a control (respectively, case). Power is verified by computer simulation.

Results: Our major findings are that: (i) the median absolute difference between analytic power with our method and simulation power was 0.001 and the absolute difference was no larger than 0.011; (ii) as the disease prevalence approaches 0, the cost of misclassifying a unaffected as a case becomes infinitely large while the cost of misclassifying an affected as a control approaches 0.

Conclusion: Our work enables researchers to specifically quantify power loss and minimum sample size requirements in the presence of phenotype errors, thereby allowing for more realistic study design. For most diseases of current interest, verifying that cases are correctly classified is of paramount importance.

Show MeSH