Limits...
Reducing bias of allele frequency estimates by modeling SNP genotype data with informative missingness.

Lin WY, Liu N - Front Genet (2012)

Bottom Line: This naïve method is straightforward but is valid only when the missingness is random.However, a given assay often has a different capability in genotyping heterozygotes and homozygotes, causing the phenomenon of "differential dropout" in the sense that the missing rates of heterozygotes and homozygotes are different.Compared with the naïve method, our method provides more accurate allele frequency estimates when the differential dropout is present.

View Article: PubMed Central - PubMed

Affiliation: Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University Taipei, Taiwan.

ABSTRACT
The presence of missing single-nucleotide polymorphism (SNP) genotypes is common in genetic studies. For studies with low-density SNPs, the most commonly used approach to dealing with genotype missingness is to simply remove the observations with missing genotypes from the analyses. This naïve method is straightforward but is valid only when the missingness is random. However, a given assay often has a different capability in genotyping heterozygotes and homozygotes, causing the phenomenon of "differential dropout" in the sense that the missing rates of heterozygotes and homozygotes are different. In practice, differential dropout among genotypes exists in even carefully designed studies, such as the data from the HapMap project and the Wellcome Trust Case Control Consortium. Under the assumption of Hardy-Weinberg equilibrium and no genotyping error, we here propose a statistical method to model the differential dropout among different genotypes. Compared with the naïve method, our method provides more accurate allele frequency estimates when the differential dropout is present. To demonstrate its practical use, we further apply our method to the HapMap data and a scleroderma data set.

No MeSH data available.


Related in: MedlinePlus

Histograms of missing rates for homozygotes and heterozygotes in the HapMap Chinese and Japanese samples.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3376470&req=5

Figure 4: Histograms of missing rates for homozygotes and heterozygotes in the HapMap Chinese and Japanese samples.

Mentions: We further applied our method to the HapMap data (HapMap, 2005). We downloaded the chromosome 17 genotype data of the 45 Chinese and 44 Japanese released in October, 2005 (HapMap, 2005). We estimated the missing rates of homozygotes (αHom) and the missing rates of heterozygotes (αHet) of these HapMap SNPs, by using our method. After removing SNPs without missing genotypes, we estimated the αHom’s and αHet’s of the remaining SNPs. Figure 4 presents the histograms of the and for the Chinese and Japanese samples, respectively. We can see that the missing rates of heterozygotes are generally larger than the missing rates of homozygotes (DDR > 1), for both the Chinese and Japanese samples. Hao and Cawley (2007) used the Affymetrix genotypes that present no evidence of DDR as benchmark to obtain an estimate of rdrop as 1.73 in the HapMap data. With our method, we estimate rdrop as 1.81 and 1.97 for the Chinese and Japanese samples, respectively.


Reducing bias of allele frequency estimates by modeling SNP genotype data with informative missingness.

Lin WY, Liu N - Front Genet (2012)

Histograms of missing rates for homozygotes and heterozygotes in the HapMap Chinese and Japanese samples.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3376470&req=5

Figure 4: Histograms of missing rates for homozygotes and heterozygotes in the HapMap Chinese and Japanese samples.
Mentions: We further applied our method to the HapMap data (HapMap, 2005). We downloaded the chromosome 17 genotype data of the 45 Chinese and 44 Japanese released in October, 2005 (HapMap, 2005). We estimated the missing rates of homozygotes (αHom) and the missing rates of heterozygotes (αHet) of these HapMap SNPs, by using our method. After removing SNPs without missing genotypes, we estimated the αHom’s and αHet’s of the remaining SNPs. Figure 4 presents the histograms of the and for the Chinese and Japanese samples, respectively. We can see that the missing rates of heterozygotes are generally larger than the missing rates of homozygotes (DDR > 1), for both the Chinese and Japanese samples. Hao and Cawley (2007) used the Affymetrix genotypes that present no evidence of DDR as benchmark to obtain an estimate of rdrop as 1.73 in the HapMap data. With our method, we estimate rdrop as 1.81 and 1.97 for the Chinese and Japanese samples, respectively.

Bottom Line: This naïve method is straightforward but is valid only when the missingness is random.However, a given assay often has a different capability in genotyping heterozygotes and homozygotes, causing the phenomenon of "differential dropout" in the sense that the missing rates of heterozygotes and homozygotes are different.Compared with the naïve method, our method provides more accurate allele frequency estimates when the differential dropout is present.

View Article: PubMed Central - PubMed

Affiliation: Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University Taipei, Taiwan.

ABSTRACT
The presence of missing single-nucleotide polymorphism (SNP) genotypes is common in genetic studies. For studies with low-density SNPs, the most commonly used approach to dealing with genotype missingness is to simply remove the observations with missing genotypes from the analyses. This naïve method is straightforward but is valid only when the missingness is random. However, a given assay often has a different capability in genotyping heterozygotes and homozygotes, causing the phenomenon of "differential dropout" in the sense that the missing rates of heterozygotes and homozygotes are different. In practice, differential dropout among genotypes exists in even carefully designed studies, such as the data from the HapMap project and the Wellcome Trust Case Control Consortium. Under the assumption of Hardy-Weinberg equilibrium and no genotyping error, we here propose a statistical method to model the differential dropout among different genotypes. Compared with the naïve method, our method provides more accurate allele frequency estimates when the differential dropout is present. To demonstrate its practical use, we further apply our method to the HapMap data and a scleroderma data set.

No MeSH data available.


Related in: MedlinePlus