Limits...
The effect of rare variants on inflation of the test statistics in case-control analyses.

Pirie A, Wood A, Lush M, Tyrer J, Pharoah PD - BMC Bioinformatics (2015)

Bottom Line: We compared the properties of the three most commonly used association tests: the likelihood ratio test, the Wald test and the score test when testing rare variants for association using simulated data.We found evidence of inflation in the median test statistics of the likelihood ratio and score tests for tests of variants with less than 20 heterozygotes across the sample, regardless of the total sample size.In contrast, the use of the Wald test is likely to result in under-inflation of the median test statistic which may mask the presence of population structure.

View Article: PubMed Central - PubMed

Affiliation: Department of Public Health and Primary Care, Strangeways Research Laboratory, University of Cambridge, 2 Worts' Causeway, Cambridge, CB1 8RN, UK. ap736@medschl.cam.ac.uk.

ABSTRACT

Background: The detection of bias due to cryptic population structure is an important step in the evaluation of findings of genetic association studies. The standard method of measuring this bias in a genetic association study is to compare the observed median association test statistic to the expected median test statistic. This ratio is inflated in the presence of cryptic population structure. However, inflation may also be caused by the properties of the association test itself particularly in the analysis of rare variants. We compared the properties of the three most commonly used association tests: the likelihood ratio test, the Wald test and the score test when testing rare variants for association using simulated data.

Results: We found evidence of inflation in the median test statistics of the likelihood ratio and score tests for tests of variants with less than 20 heterozygotes across the sample, regardless of the total sample size. The test statistics for the Wald test were under-inflated at the median for variants below the same minor allele frequency.

Conclusions: In a genetic association study, if a substantial proportion of the genetic variants tested have rare minor allele frequencies, the properties of the association test may mask the presence or absence of bias due to population structure. The use of either the likelihood ratio test or the score test is likely to lead to inflation in the median test statistic in the absence of population structure. In contrast, the use of the Wald test is likely to result in under-inflation of the median test statistic which may mask the presence of population structure.

Show MeSH

Related in: MedlinePlus

The level of inflation in the test statistic evaluated at the mean is used to smooth out the variation in the median test statistic caused by the small number of contingencies. We consider how the over-dispersion ratio varies as the frequency of variant increases. a) The over-dispersion ratio evaluated at the mean test statistic in a case–control analysis of 5,000 samples with variants with up to 50 heterozygotes. b) The over-dispersion ratio evaluated at the mean test statistic in a case–control analysis of 10,000 samples with variants with up to 50 heterozygotes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4339749&req=5

Fig2: The level of inflation in the test statistic evaluated at the mean is used to smooth out the variation in the median test statistic caused by the small number of contingencies. We consider how the over-dispersion ratio varies as the frequency of variant increases. a) The over-dispersion ratio evaluated at the mean test statistic in a case–control analysis of 5,000 samples with variants with up to 50 heterozygotes. b) The over-dispersion ratio evaluated at the mean test statistic in a case–control analysis of 10,000 samples with variants with up to 50 heterozygotes.

Mentions: The use of the mean test statistic smoothes out the variation observed in the median test statistic which was caused by the small number of contingencies. The mean LRT statistic is inflated, whereas the mean Wald test statistic is underinflated, for tests based on fewer than 20 heterozygotes (Figure 2a). The score test performs the best of the three with an inflation estimate close to 1 across a range of heterozygote frequencies. The total sample size made little difference to the pattern produced from either measure and the most important variable was the total number of heterozygotes (Figures 1b and 2b), which is dependent on allele frequency and sample size. The results from the analysis of variants of a specified allele frequency show that variants with a heterozygote frequency of less than 20 are likely to cause inflation of the test statistic.Figure 2


The effect of rare variants on inflation of the test statistics in case-control analyses.

Pirie A, Wood A, Lush M, Tyrer J, Pharoah PD - BMC Bioinformatics (2015)

The level of inflation in the test statistic evaluated at the mean is used to smooth out the variation in the median test statistic caused by the small number of contingencies. We consider how the over-dispersion ratio varies as the frequency of variant increases. a) The over-dispersion ratio evaluated at the mean test statistic in a case–control analysis of 5,000 samples with variants with up to 50 heterozygotes. b) The over-dispersion ratio evaluated at the mean test statistic in a case–control analysis of 10,000 samples with variants with up to 50 heterozygotes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4339749&req=5

Fig2: The level of inflation in the test statistic evaluated at the mean is used to smooth out the variation in the median test statistic caused by the small number of contingencies. We consider how the over-dispersion ratio varies as the frequency of variant increases. a) The over-dispersion ratio evaluated at the mean test statistic in a case–control analysis of 5,000 samples with variants with up to 50 heterozygotes. b) The over-dispersion ratio evaluated at the mean test statistic in a case–control analysis of 10,000 samples with variants with up to 50 heterozygotes.
Mentions: The use of the mean test statistic smoothes out the variation observed in the median test statistic which was caused by the small number of contingencies. The mean LRT statistic is inflated, whereas the mean Wald test statistic is underinflated, for tests based on fewer than 20 heterozygotes (Figure 2a). The score test performs the best of the three with an inflation estimate close to 1 across a range of heterozygote frequencies. The total sample size made little difference to the pattern produced from either measure and the most important variable was the total number of heterozygotes (Figures 1b and 2b), which is dependent on allele frequency and sample size. The results from the analysis of variants of a specified allele frequency show that variants with a heterozygote frequency of less than 20 are likely to cause inflation of the test statistic.Figure 2

Bottom Line: We compared the properties of the three most commonly used association tests: the likelihood ratio test, the Wald test and the score test when testing rare variants for association using simulated data.We found evidence of inflation in the median test statistics of the likelihood ratio and score tests for tests of variants with less than 20 heterozygotes across the sample, regardless of the total sample size.In contrast, the use of the Wald test is likely to result in under-inflation of the median test statistic which may mask the presence of population structure.

View Article: PubMed Central - PubMed

Affiliation: Department of Public Health and Primary Care, Strangeways Research Laboratory, University of Cambridge, 2 Worts' Causeway, Cambridge, CB1 8RN, UK. ap736@medschl.cam.ac.uk.

ABSTRACT

Background: The detection of bias due to cryptic population structure is an important step in the evaluation of findings of genetic association studies. The standard method of measuring this bias in a genetic association study is to compare the observed median association test statistic to the expected median test statistic. This ratio is inflated in the presence of cryptic population structure. However, inflation may also be caused by the properties of the association test itself particularly in the analysis of rare variants. We compared the properties of the three most commonly used association tests: the likelihood ratio test, the Wald test and the score test when testing rare variants for association using simulated data.

Results: We found evidence of inflation in the median test statistics of the likelihood ratio and score tests for tests of variants with less than 20 heterozygotes across the sample, regardless of the total sample size. The test statistics for the Wald test were under-inflated at the median for variants below the same minor allele frequency.

Conclusions: In a genetic association study, if a substantial proportion of the genetic variants tested have rare minor allele frequencies, the properties of the association test may mask the presence or absence of bias due to population structure. The use of either the likelihood ratio test or the score test is likely to lead to inflation in the median test statistic in the absence of population structure. In contrast, the use of the Wald test is likely to result in under-inflation of the median test statistic which may mask the presence of population structure.

Show MeSH
Related in: MedlinePlus