Limits...
A new statistic and its power to infer membership in a genome-wide association study using genotype frequencies.

Jacobs KB, Yeager M, Wacholder S, Craig D, Kraft P, Hunter DJ, Paschal J, Manolio TA, Tucker M, Hoover RN, Thomas GD, Chanock SJ, Chatterjee N - Nat. Genet. (2009)

Bottom Line: Aggregate results from genome-wide association studies (GWAS), such as genotype frequencies for cases and controls, were until recently often made available on public websites because they were thought to disclose negligible information concerning an individual's participation in a study.Homer et al. recently suggested that a method for forensic detection of an individual's contribution to an admixed DNA sample could be applied to aggregate GWAS data.We derive the theoretical power of our test statistics and explore the empirical performance in scenarios with varying numbers of randomly chosen or top-associated SNPs.

View Article: PubMed Central - PubMed

ABSTRACT
Aggregate results from genome-wide association studies (GWAS), such as genotype frequencies for cases and controls, were until recently often made available on public websites because they were thought to disclose negligible information concerning an individual's participation in a study. Homer et al. recently suggested that a method for forensic detection of an individual's contribution to an admixed DNA sample could be applied to aggregate GWAS data. Using a likelihood-based statistical framework, we developed an improved statistic that uses genotype frequencies and individual genotypes to infer whether a specific individual or any close relatives participated in the GWAS and, if so, what the participant's phenotype status is. Our statistic compares the logarithm of genotype frequencies, in contrast to that of Homer et al., which is based on differences in either SNP probe intensity or allele frequencies. We derive the theoretical power of our test statistics and explore the empirical performance in scenarios with varying numbers of randomly chosen or top-associated SNPs.

Show MeSH
Sensitivity and specificity of Tgeno applied to GWAS dataLog-scale Receiver Operating Characteristic (ROC) curves of Tgeno with Illumina HumanHap550 data from GWAS scenarios with 1000/1000 and 5000/5000 cases and controls of European descent.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2803072&req=5

Figure 3: Sensitivity and specificity of Tgeno applied to GWAS dataLog-scale Receiver Operating Characteristic (ROC) curves of Tgeno with Illumina HumanHap550 data from GWAS scenarios with 1000/1000 and 5000/5000 cases and controls of European descent.

Mentions: Using these data, GWAS scenarios were explored by selecting subsets of cases and controls, estimating genotype frequencies of each group, and fitting logistic genotype-phenotype association models. In each scenario, all 13,604 individuals were tested for membership conditional on a fixed set of cases and controls chosen. We also attempted to infer the phenotype of the cases and controls selected in each scenario given the knowledge that they participated in the study. Individuals not selected as cases or controls for a given scenario were used to empirically estimate the distribution. Figure 3 contains Receiver Operating Characteristic (ROC) curves showing empirical sensitivity and specificity for classifying individuals as participants and the determination of their phenotype given knowledge of participation. GWAS scenarios with a fixed subset of 1,000 and 5,000 cases and an equal number of controls are shown for varying numbers of randomly chosen or top associated SNPs. These ROC curves focus on high values of specificity with 1-specificity in the range of 0.05 to 10−6 on a logarithmic scale. Supplementary Figure 1 shows ROC curves for additional GWAS scenarios with 1,000-5,000 cases and an equal number of controls. Supplementary Figure 2 is analogous to Supplementary Figure 1 except showing ROC curves for the non-log scale for the full range of specificity. Supplementary Figure 3 is analogous to Supplementary Figure 1, except with ROC curves for Tallele.


A new statistic and its power to infer membership in a genome-wide association study using genotype frequencies.

Jacobs KB, Yeager M, Wacholder S, Craig D, Kraft P, Hunter DJ, Paschal J, Manolio TA, Tucker M, Hoover RN, Thomas GD, Chanock SJ, Chatterjee N - Nat. Genet. (2009)

Sensitivity and specificity of Tgeno applied to GWAS dataLog-scale Receiver Operating Characteristic (ROC) curves of Tgeno with Illumina HumanHap550 data from GWAS scenarios with 1000/1000 and 5000/5000 cases and controls of European descent.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2803072&req=5

Figure 3: Sensitivity and specificity of Tgeno applied to GWAS dataLog-scale Receiver Operating Characteristic (ROC) curves of Tgeno with Illumina HumanHap550 data from GWAS scenarios with 1000/1000 and 5000/5000 cases and controls of European descent.
Mentions: Using these data, GWAS scenarios were explored by selecting subsets of cases and controls, estimating genotype frequencies of each group, and fitting logistic genotype-phenotype association models. In each scenario, all 13,604 individuals were tested for membership conditional on a fixed set of cases and controls chosen. We also attempted to infer the phenotype of the cases and controls selected in each scenario given the knowledge that they participated in the study. Individuals not selected as cases or controls for a given scenario were used to empirically estimate the distribution. Figure 3 contains Receiver Operating Characteristic (ROC) curves showing empirical sensitivity and specificity for classifying individuals as participants and the determination of their phenotype given knowledge of participation. GWAS scenarios with a fixed subset of 1,000 and 5,000 cases and an equal number of controls are shown for varying numbers of randomly chosen or top associated SNPs. These ROC curves focus on high values of specificity with 1-specificity in the range of 0.05 to 10−6 on a logarithmic scale. Supplementary Figure 1 shows ROC curves for additional GWAS scenarios with 1,000-5,000 cases and an equal number of controls. Supplementary Figure 2 is analogous to Supplementary Figure 1 except showing ROC curves for the non-log scale for the full range of specificity. Supplementary Figure 3 is analogous to Supplementary Figure 1, except with ROC curves for Tallele.

Bottom Line: Aggregate results from genome-wide association studies (GWAS), such as genotype frequencies for cases and controls, were until recently often made available on public websites because they were thought to disclose negligible information concerning an individual's participation in a study.Homer et al. recently suggested that a method for forensic detection of an individual's contribution to an admixed DNA sample could be applied to aggregate GWAS data.We derive the theoretical power of our test statistics and explore the empirical performance in scenarios with varying numbers of randomly chosen or top-associated SNPs.

View Article: PubMed Central - PubMed

ABSTRACT
Aggregate results from genome-wide association studies (GWAS), such as genotype frequencies for cases and controls, were until recently often made available on public websites because they were thought to disclose negligible information concerning an individual's participation in a study. Homer et al. recently suggested that a method for forensic detection of an individual's contribution to an admixed DNA sample could be applied to aggregate GWAS data. Using a likelihood-based statistical framework, we developed an improved statistic that uses genotype frequencies and individual genotypes to infer whether a specific individual or any close relatives participated in the GWAS and, if so, what the participant's phenotype status is. Our statistic compares the logarithm of genotype frequencies, in contrast to that of Homer et al., which is based on differences in either SNP probe intensity or allele frequencies. We derive the theoretical power of our test statistics and explore the empirical performance in scenarios with varying numbers of randomly chosen or top-associated SNPs.

Show MeSH