Limits...
Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies.

Edwards BJ, Haynes C, Levenstien MA, Finch SJ, Gordon D - BMC Genet. (2005)

Bottom Line: Phenotype error causes reduction in power to detect genetic association.Our work enables researchers to specifically quantify power loss and minimum sample size requirements in the presence of phenotype errors, thereby allowing for more realistic study design.For most diseases of current interest, verifying that cases are correctly classified is of paramount importance.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Statistical Genetics, Rockefeller University, New York, NY 10021, USA. brian.edwards@yale.edu

ABSTRACT

Background: Phenotype error causes reduction in power to detect genetic association. We present a quantification of phenotype error, also known as diagnostic error, on power and sample size calculations for case-control genetic association studies between a marker locus and a disease phenotype. We consider the classic Pearson chi-square test for independence as our test of genetic association. To determine asymptotic power analytically, we compute the distribution's non-centrality parameter, which is a function of the case and control sample sizes, genotype frequencies, disease prevalence, and phenotype misclassification probabilities. We derive the non-centrality parameter in the presence of phenotype errors and equivalent formulas for misclassification cost (the percentage increase in minimum sample size needed to maintain constant asymptotic power at a fixed significance level for each percentage increase in a given misclassification parameter). We use a linear Taylor Series approximation for the cost of phenotype misclassification to determine lower bounds for the relative costs of misclassifying a true affected (respectively, unaffected) as a control (respectively, case). Power is verified by computer simulation.

Results: Our major findings are that: (i) the median absolute difference between analytic power with our method and simulation power was 0.001 and the absolute difference was no larger than 0.011; (ii) as the disease prevalence approaches 0, the cost of misclassifying a unaffected as a case becomes infinitely large while the cost of misclassifying an affected as a control approaches 0.

Conclusion: Our work enables researchers to specifically quantify power loss and minimum sample size requirements in the presence of phenotype errors, thereby allowing for more realistic study design. For most diseases of current interest, verifying that cases are correctly classified is of paramount importance.

Show MeSH

Related in: MedlinePlus

Contour plot of minimum number of cases needed to maintain constant asymptotic power of 95% at a 5% significance level in the presence of phenotype misclassification for Alzheimer's disease ApoE example. We compute the increase in minimum cases () needed to maintain constant 95% asymptotic power at the 5% significance level (using a central χ2 distribution with 5 degrees of freedom) in the presence of errors. Sample sizes are computed using equation (3). The affected and unaffected genotype frequencies are taken from a previous publication [9, 14]. In that work, the marker locus considered was ApoE and the disease phenotype was Alzheimer's disease. We use the LRTae estimates from table 5 of that work [9]. Six genotypes are observed in most populations. The frequencies we use to perform the sample size calculations in figure 1 are presented in the Methods section (Minimum sample size requirements in presence of phenotype misclassification – Alzheimer's Disease ApoE example). We assume that equal numbers of cases and controls are collected. Also, we specify a prevalence K = 0.02, which is consistent with recent published reports for Alzheimer's Disease in the U. S. [32]. Sample sizes are calculated for each misclassification parameter θ, φ ranging from 0.0 to 0.15 in increments of 0.01. The number of cases ranges from 484 when θ = φ = 0 to 10,187 when θ = φ = 0.15. In this figure, each (approximately) horizontal line represents a constant sample size as a function of the misclassification parameters θ and φ. For two consecutive horizontal lines, the values in between those lines (represented by different colors) have sample sizes that are between the sample sizes indicated by the two horizontal lines.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1131899&req=5

Figure 1: Contour plot of minimum number of cases needed to maintain constant asymptotic power of 95% at a 5% significance level in the presence of phenotype misclassification for Alzheimer's disease ApoE example. We compute the increase in minimum cases () needed to maintain constant 95% asymptotic power at the 5% significance level (using a central χ2 distribution with 5 degrees of freedom) in the presence of errors. Sample sizes are computed using equation (3). The affected and unaffected genotype frequencies are taken from a previous publication [9, 14]. In that work, the marker locus considered was ApoE and the disease phenotype was Alzheimer's disease. We use the LRTae estimates from table 5 of that work [9]. Six genotypes are observed in most populations. The frequencies we use to perform the sample size calculations in figure 1 are presented in the Methods section (Minimum sample size requirements in presence of phenotype misclassification – Alzheimer's Disease ApoE example). We assume that equal numbers of cases and controls are collected. Also, we specify a prevalence K = 0.02, which is consistent with recent published reports for Alzheimer's Disease in the U. S. [32]. Sample sizes are calculated for each misclassification parameter θ, φ ranging from 0.0 to 0.15 in increments of 0.01. The number of cases ranges from 484 when θ = φ = 0 to 10,187 when θ = φ = 0.15. In this figure, each (approximately) horizontal line represents a constant sample size as a function of the misclassification parameters θ and φ. For two consecutive horizontal lines, the values in between those lines (represented by different colors) have sample sizes that are between the sample sizes indicated by the two horizontal lines.

Mentions: Figure 1 presents a contour plot of the minimum sample size necessary to maintain a constant power of 95% at the 5% significance level using the parameter values taken from the methods section (see Methods – Minimum sample size requirements in presence of phenotype misclassification – Alzheimer's disease ApoE example). Each approximately horizontal line represents a constant minimum number of cases (as a function of the misclassification parameters φ and θ). For two consecutive horizontal lines, the values in between those lines (represented by different colors) have sample sizes that are between the sample sizes indicated by the two horizontal lines. For example, consider the consecutive, approximately horizontal lines labeled 3394.9 and 4365.9 (third and fourth lines up, respectively, in figure 1). All values of θ and φ whose Cartesian coordinate(θ, φ) lies between these two lines have a corresponding minimum sample size between 3395 and 4365. An example of such a pair is the coordinate (0.00,0.075). Note that the minimum sample size of 484 occurs when φ = θ = 0 and the maximum sample size of 10,187 occurs when φ = θ = 0.15.


Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies.

Edwards BJ, Haynes C, Levenstien MA, Finch SJ, Gordon D - BMC Genet. (2005)

Contour plot of minimum number of cases needed to maintain constant asymptotic power of 95% at a 5% significance level in the presence of phenotype misclassification for Alzheimer's disease ApoE example. We compute the increase in minimum cases () needed to maintain constant 95% asymptotic power at the 5% significance level (using a central χ2 distribution with 5 degrees of freedom) in the presence of errors. Sample sizes are computed using equation (3). The affected and unaffected genotype frequencies are taken from a previous publication [9, 14]. In that work, the marker locus considered was ApoE and the disease phenotype was Alzheimer's disease. We use the LRTae estimates from table 5 of that work [9]. Six genotypes are observed in most populations. The frequencies we use to perform the sample size calculations in figure 1 are presented in the Methods section (Minimum sample size requirements in presence of phenotype misclassification – Alzheimer's Disease ApoE example). We assume that equal numbers of cases and controls are collected. Also, we specify a prevalence K = 0.02, which is consistent with recent published reports for Alzheimer's Disease in the U. S. [32]. Sample sizes are calculated for each misclassification parameter θ, φ ranging from 0.0 to 0.15 in increments of 0.01. The number of cases ranges from 484 when θ = φ = 0 to 10,187 when θ = φ = 0.15. In this figure, each (approximately) horizontal line represents a constant sample size as a function of the misclassification parameters θ and φ. For two consecutive horizontal lines, the values in between those lines (represented by different colors) have sample sizes that are between the sample sizes indicated by the two horizontal lines.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1131899&req=5

Figure 1: Contour plot of minimum number of cases needed to maintain constant asymptotic power of 95% at a 5% significance level in the presence of phenotype misclassification for Alzheimer's disease ApoE example. We compute the increase in minimum cases () needed to maintain constant 95% asymptotic power at the 5% significance level (using a central χ2 distribution with 5 degrees of freedom) in the presence of errors. Sample sizes are computed using equation (3). The affected and unaffected genotype frequencies are taken from a previous publication [9, 14]. In that work, the marker locus considered was ApoE and the disease phenotype was Alzheimer's disease. We use the LRTae estimates from table 5 of that work [9]. Six genotypes are observed in most populations. The frequencies we use to perform the sample size calculations in figure 1 are presented in the Methods section (Minimum sample size requirements in presence of phenotype misclassification – Alzheimer's Disease ApoE example). We assume that equal numbers of cases and controls are collected. Also, we specify a prevalence K = 0.02, which is consistent with recent published reports for Alzheimer's Disease in the U. S. [32]. Sample sizes are calculated for each misclassification parameter θ, φ ranging from 0.0 to 0.15 in increments of 0.01. The number of cases ranges from 484 when θ = φ = 0 to 10,187 when θ = φ = 0.15. In this figure, each (approximately) horizontal line represents a constant sample size as a function of the misclassification parameters θ and φ. For two consecutive horizontal lines, the values in between those lines (represented by different colors) have sample sizes that are between the sample sizes indicated by the two horizontal lines.
Mentions: Figure 1 presents a contour plot of the minimum sample size necessary to maintain a constant power of 95% at the 5% significance level using the parameter values taken from the methods section (see Methods – Minimum sample size requirements in presence of phenotype misclassification – Alzheimer's disease ApoE example). Each approximately horizontal line represents a constant minimum number of cases (as a function of the misclassification parameters φ and θ). For two consecutive horizontal lines, the values in between those lines (represented by different colors) have sample sizes that are between the sample sizes indicated by the two horizontal lines. For example, consider the consecutive, approximately horizontal lines labeled 3394.9 and 4365.9 (third and fourth lines up, respectively, in figure 1). All values of θ and φ whose Cartesian coordinate(θ, φ) lies between these two lines have a corresponding minimum sample size between 3395 and 4365. An example of such a pair is the coordinate (0.00,0.075). Note that the minimum sample size of 484 occurs when φ = θ = 0 and the maximum sample size of 10,187 occurs when φ = θ = 0.15.

Bottom Line: Phenotype error causes reduction in power to detect genetic association.Our work enables researchers to specifically quantify power loss and minimum sample size requirements in the presence of phenotype errors, thereby allowing for more realistic study design.For most diseases of current interest, verifying that cases are correctly classified is of paramount importance.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Statistical Genetics, Rockefeller University, New York, NY 10021, USA. brian.edwards@yale.edu

ABSTRACT

Background: Phenotype error causes reduction in power to detect genetic association. We present a quantification of phenotype error, also known as diagnostic error, on power and sample size calculations for case-control genetic association studies between a marker locus and a disease phenotype. We consider the classic Pearson chi-square test for independence as our test of genetic association. To determine asymptotic power analytically, we compute the distribution's non-centrality parameter, which is a function of the case and control sample sizes, genotype frequencies, disease prevalence, and phenotype misclassification probabilities. We derive the non-centrality parameter in the presence of phenotype errors and equivalent formulas for misclassification cost (the percentage increase in minimum sample size needed to maintain constant asymptotic power at a fixed significance level for each percentage increase in a given misclassification parameter). We use a linear Taylor Series approximation for the cost of phenotype misclassification to determine lower bounds for the relative costs of misclassifying a true affected (respectively, unaffected) as a control (respectively, case). Power is verified by computer simulation.

Results: Our major findings are that: (i) the median absolute difference between analytic power with our method and simulation power was 0.001 and the absolute difference was no larger than 0.011; (ii) as the disease prevalence approaches 0, the cost of misclassifying a unaffected as a case becomes infinitely large while the cost of misclassifying an affected as a control approaches 0.

Conclusion: Our work enables researchers to specifically quantify power loss and minimum sample size requirements in the presence of phenotype errors, thereby allowing for more realistic study design. For most diseases of current interest, verifying that cases are correctly classified is of paramount importance.

Show MeSH
Related in: MedlinePlus