Limits...
Reducing bias of allele frequency estimates by modeling SNP genotype data with informative missingness.

Lin WY, Liu N - Front Genet (2012)

Bottom Line: This naïve method is straightforward but is valid only when the missingness is random.However, a given assay often has a different capability in genotyping heterozygotes and homozygotes, causing the phenomenon of "differential dropout" in the sense that the missing rates of heterozygotes and homozygotes are different.Compared with the naïve method, our method provides more accurate allele frequency estimates when the differential dropout is present.

View Article: PubMed Central - PubMed

Affiliation: Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University Taipei, Taiwan.

ABSTRACT
The presence of missing single-nucleotide polymorphism (SNP) genotypes is common in genetic studies. For studies with low-density SNPs, the most commonly used approach to dealing with genotype missingness is to simply remove the observations with missing genotypes from the analyses. This naïve method is straightforward but is valid only when the missingness is random. However, a given assay often has a different capability in genotyping heterozygotes and homozygotes, causing the phenomenon of "differential dropout" in the sense that the missing rates of heterozygotes and homozygotes are different. In practice, differential dropout among genotypes exists in even carefully designed studies, such as the data from the HapMap project and the Wellcome Trust Case Control Consortium. Under the assumption of Hardy-Weinberg equilibrium and no genotyping error, we here propose a statistical method to model the differential dropout among different genotypes. Compared with the naïve method, our method provides more accurate allele frequency estimates when the differential dropout is present. To demonstrate its practical use, we further apply our method to the HapMap data and a scleroderma data set.

No MeSH data available.


Related in: MedlinePlus

Missing rates for homozygotes and heterozygotes in the HapMap Chinese and Japanese samples. Each interval presents the point estimate ±standard error of missing rate for homozygotes or for heterozygotes of a SNP.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3376470&req=5

Figure 5: Missing rates for homozygotes and heterozygotes in the HapMap Chinese and Japanese samples. Each interval presents the point estimate ±standard error of missing rate for homozygotes or for heterozygotes of a SNP.

Mentions: Figure 5 shows the interval of point estimate ±standard error of missing rate for each HapMap SNP on chromosome 17. If we approximate the confidence intervals of missing rates with point estimates ±2 × standard errors, all SNPs have overlapped confidence intervals of missing rates for homozygotes and for heterozygotes. For these SNPs, the missing rates for homozygotes and for heterozygotes are not significantly different. This is not unexpected given the small sample sizes of the HapMap data (∼45).


Reducing bias of allele frequency estimates by modeling SNP genotype data with informative missingness.

Lin WY, Liu N - Front Genet (2012)

Missing rates for homozygotes and heterozygotes in the HapMap Chinese and Japanese samples. Each interval presents the point estimate ±standard error of missing rate for homozygotes or for heterozygotes of a SNP.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3376470&req=5

Figure 5: Missing rates for homozygotes and heterozygotes in the HapMap Chinese and Japanese samples. Each interval presents the point estimate ±standard error of missing rate for homozygotes or for heterozygotes of a SNP.
Mentions: Figure 5 shows the interval of point estimate ±standard error of missing rate for each HapMap SNP on chromosome 17. If we approximate the confidence intervals of missing rates with point estimates ±2 × standard errors, all SNPs have overlapped confidence intervals of missing rates for homozygotes and for heterozygotes. For these SNPs, the missing rates for homozygotes and for heterozygotes are not significantly different. This is not unexpected given the small sample sizes of the HapMap data (∼45).

Bottom Line: This naïve method is straightforward but is valid only when the missingness is random.However, a given assay often has a different capability in genotyping heterozygotes and homozygotes, causing the phenomenon of "differential dropout" in the sense that the missing rates of heterozygotes and homozygotes are different.Compared with the naïve method, our method provides more accurate allele frequency estimates when the differential dropout is present.

View Article: PubMed Central - PubMed

Affiliation: Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University Taipei, Taiwan.

ABSTRACT
The presence of missing single-nucleotide polymorphism (SNP) genotypes is common in genetic studies. For studies with low-density SNPs, the most commonly used approach to dealing with genotype missingness is to simply remove the observations with missing genotypes from the analyses. This naïve method is straightforward but is valid only when the missingness is random. However, a given assay often has a different capability in genotyping heterozygotes and homozygotes, causing the phenomenon of "differential dropout" in the sense that the missing rates of heterozygotes and homozygotes are different. In practice, differential dropout among genotypes exists in even carefully designed studies, such as the data from the HapMap project and the Wellcome Trust Case Control Consortium. Under the assumption of Hardy-Weinberg equilibrium and no genotyping error, we here propose a statistical method to model the differential dropout among different genotypes. Compared with the naïve method, our method provides more accurate allele frequency estimates when the differential dropout is present. To demonstrate its practical use, we further apply our method to the HapMap data and a scleroderma data set.

No MeSH data available.


Related in: MedlinePlus