Limits...
Reducing bias of allele frequency estimates by modeling SNP genotype data with informative missingness.

Lin WY, Liu N - Front Genet (2012)

Bottom Line: This naïve method is straightforward but is valid only when the missingness is random.However, a given assay often has a different capability in genotyping heterozygotes and homozygotes, causing the phenomenon of "differential dropout" in the sense that the missing rates of heterozygotes and homozygotes are different.Compared with the naïve method, our method provides more accurate allele frequency estimates when the differential dropout is present.

View Article: PubMed Central - PubMed

Affiliation: Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University Taipei, Taiwan.

ABSTRACT
The presence of missing single-nucleotide polymorphism (SNP) genotypes is common in genetic studies. For studies with low-density SNPs, the most commonly used approach to dealing with genotype missingness is to simply remove the observations with missing genotypes from the analyses. This naïve method is straightforward but is valid only when the missingness is random. However, a given assay often has a different capability in genotyping heterozygotes and homozygotes, causing the phenomenon of "differential dropout" in the sense that the missing rates of heterozygotes and homozygotes are different. In practice, differential dropout among genotypes exists in even carefully designed studies, such as the data from the HapMap project and the Wellcome Trust Case Control Consortium. Under the assumption of Hardy-Weinberg equilibrium and no genotyping error, we here propose a statistical method to model the differential dropout among different genotypes. Compared with the naïve method, our method provides more accurate allele frequency estimates when the differential dropout is present. To demonstrate its practical use, we further apply our method to the HapMap data and a scleroderma data set.

No MeSH data available.


Related in: MedlinePlus

The box-and-whiskers plots of 1,000 estimates of MAF, given MAF = 0.1 and the fixation index f = 0.1. The different panels in the figure are arranged so that the overall genotype missing rate (P.drop) is 0.02, 0.05, 0.1, and 0.15 (from top to bottom) and the DDR (r.drop) is 0.25, 0.5, 1, 2.5, 5, and 10 (from left to right). Below each panel, we list the mean of the 1,000 P values of the exact test for Hardy–Weinberg equilibrium for the 1,000 replications.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3376470&req=5

Figure 2: The box-and-whiskers plots of 1,000 estimates of MAF, given MAF = 0.1 and the fixation index f = 0.1. The different panels in the figure are arranged so that the overall genotype missing rate (P.drop) is 0.02, 0.05, 0.1, and 0.15 (from top to bottom) and the DDR (r.drop) is 0.25, 0.5, 1, 2.5, 5, and 10 (from left to right). Below each panel, we list the mean of the 1,000 P values of the exact test for Hardy–Weinberg equilibrium for the 1,000 replications.

Mentions: where f is the fixation index (Weir, 1996; Wakefield, 2010), a measure of the departure from HWE. When f = 0, there is no departure from HWE. The larger the departure of f from 0, the larger the degree of HWD. When f is positive, the departure from HWE results in excess homozygosity. When f is negative, the departure from HWE results in excess heterozygosity. We simulated a SNP with MAF of 0.1. The total sample size was set at 2,000. Following the setting of fixation index when Chen and Kao (2006) examined the sensitivity of their method to the assumption of HWE, we also evaluated the performance of our method with the fixation index f of 0.1 and 0.2. Figures 2 and 3 present the box-and-whiskers plots of the 1,000 estimates of allele frequencies when the fixation index f = 0.1 and 0.2, respectively. We can see that our method leads to an upward bias to the allele frequency estimates when f > 0, and a downward bias when f < 0 (result not shown). Our method is not very robust to the assumption of HWE. This is a caution when applying this approach.


Reducing bias of allele frequency estimates by modeling SNP genotype data with informative missingness.

Lin WY, Liu N - Front Genet (2012)

The box-and-whiskers plots of 1,000 estimates of MAF, given MAF = 0.1 and the fixation index f = 0.1. The different panels in the figure are arranged so that the overall genotype missing rate (P.drop) is 0.02, 0.05, 0.1, and 0.15 (from top to bottom) and the DDR (r.drop) is 0.25, 0.5, 1, 2.5, 5, and 10 (from left to right). Below each panel, we list the mean of the 1,000 P values of the exact test for Hardy–Weinberg equilibrium for the 1,000 replications.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3376470&req=5

Figure 2: The box-and-whiskers plots of 1,000 estimates of MAF, given MAF = 0.1 and the fixation index f = 0.1. The different panels in the figure are arranged so that the overall genotype missing rate (P.drop) is 0.02, 0.05, 0.1, and 0.15 (from top to bottom) and the DDR (r.drop) is 0.25, 0.5, 1, 2.5, 5, and 10 (from left to right). Below each panel, we list the mean of the 1,000 P values of the exact test for Hardy–Weinberg equilibrium for the 1,000 replications.
Mentions: where f is the fixation index (Weir, 1996; Wakefield, 2010), a measure of the departure from HWE. When f = 0, there is no departure from HWE. The larger the departure of f from 0, the larger the degree of HWD. When f is positive, the departure from HWE results in excess homozygosity. When f is negative, the departure from HWE results in excess heterozygosity. We simulated a SNP with MAF of 0.1. The total sample size was set at 2,000. Following the setting of fixation index when Chen and Kao (2006) examined the sensitivity of their method to the assumption of HWE, we also evaluated the performance of our method with the fixation index f of 0.1 and 0.2. Figures 2 and 3 present the box-and-whiskers plots of the 1,000 estimates of allele frequencies when the fixation index f = 0.1 and 0.2, respectively. We can see that our method leads to an upward bias to the allele frequency estimates when f > 0, and a downward bias when f < 0 (result not shown). Our method is not very robust to the assumption of HWE. This is a caution when applying this approach.

Bottom Line: This naïve method is straightforward but is valid only when the missingness is random.However, a given assay often has a different capability in genotyping heterozygotes and homozygotes, causing the phenomenon of "differential dropout" in the sense that the missing rates of heterozygotes and homozygotes are different.Compared with the naïve method, our method provides more accurate allele frequency estimates when the differential dropout is present.

View Article: PubMed Central - PubMed

Affiliation: Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University Taipei, Taiwan.

ABSTRACT
The presence of missing single-nucleotide polymorphism (SNP) genotypes is common in genetic studies. For studies with low-density SNPs, the most commonly used approach to dealing with genotype missingness is to simply remove the observations with missing genotypes from the analyses. This naïve method is straightforward but is valid only when the missingness is random. However, a given assay often has a different capability in genotyping heterozygotes and homozygotes, causing the phenomenon of "differential dropout" in the sense that the missing rates of heterozygotes and homozygotes are different. In practice, differential dropout among genotypes exists in even carefully designed studies, such as the data from the HapMap project and the Wellcome Trust Case Control Consortium. Under the assumption of Hardy-Weinberg equilibrium and no genotyping error, we here propose a statistical method to model the differential dropout among different genotypes. Compared with the naïve method, our method provides more accurate allele frequency estimates when the differential dropout is present. To demonstrate its practical use, we further apply our method to the HapMap data and a scleroderma data set.

No MeSH data available.


Related in: MedlinePlus