Limits...
SAQC: SNP array quality control.

Yang HC, Lin HC, Kang M, Chen CH, Lin CW, Li LH, Wu JY, Chen YT, Pan WH - BMC Bioinformatics (2011)

Bottom Line: This study introduces new quality indices, establishes references for AFs and quality indices, and develops a detection method for poor-quality SNP arrays and/or DNA samples.We have developed a new computer program that utilizes these methods called SNP Array Quality Control (SAQC).SAQC software is written in R and R-GUI and was developed as a user-friendly tool for the visualization and evaluation of data quality of genome-wide SNP arrays.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Statistical Science, Academia Sinica, Taipei, Taiwan. hsinchou@stat.sinica.edu.tw

ABSTRACT

Background: Genome-wide single-nucleotide polymorphism (SNP) arrays containing hundreds of thousands of SNPs from the human genome have proven useful for studying important human genome questions. Data quality of SNP arrays plays a key role in the accuracy and precision of downstream data analyses. However, good indices for assessing data quality of SNP arrays have not yet been developed.

Results: We developed new quality indices to measure the quality of SNP arrays and/or DNA samples and investigated their statistical properties. The indices quantify a departure of estimated individual-level allele frequencies (AFs) from expected frequencies via standardized distances. The proposed quality indices followed lognormal distributions in several large genomic studies that we empirically evaluated. AF reference data and quality index reference data for different SNP array platforms were established based on samples from various reference populations. Furthermore, a confidence interval method based on the underlying empirical distributions of quality indices was developed to identify poor-quality SNP arrays and/or DNA samples. Analyses of authentic biological data and simulated data show that this new method is sensitive and specific for the detection of poor-quality SNP arrays and/or DNA samples.

Conclusions: This study introduces new quality indices, establishes references for AFs and quality indices, and develops a detection method for poor-quality SNP arrays and/or DNA samples. We have developed a new computer program that utilizes these methods called SNP Array Quality Control (SAQC). SAQC software is written in R and R-GUI and was developed as a user-friendly tool for the visualization and evaluation of data quality of genome-wide SNP arrays. The program is available online (http://www.stat.sinica.edu.tw/hsinchou/genetics/quality/SAQC.htm).

Show MeSH
Detection rate of quality indices in the simulation study based on the Affymetrix 500K SNP arrays. Averages and standard deviations of detection rates of the genotype-based and nearest-mean-based quality indices {Q1(ρ), Q2(ρ), ρ = 95%, 97.5%, 99%} for a relative experimental error r of 0-60%. The data were generated from the TWN population.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3101186&req=5

Figure 4: Detection rate of quality indices in the simulation study based on the Affymetrix 500K SNP arrays. Averages and standard deviations of detection rates of the genotype-based and nearest-mean-based quality indices {Q1(ρ), Q2(ρ), ρ = 95%, 97.5%, 99%} for a relative experimental error r of 0-60%. The data were generated from the TWN population.

Mentions: We defined detection rate as a proportion of poor-quality SNP arrays detected by the proposed confidence interval method according to a 95%, 97.5%, or 99% quantile of quality index. We calculated the mean and standard deviation of detection rates of 1,000 simulations at a relative experimental error (r) of 0-0.6 at increments 0.025. Results of the Affymetrix 100K and Affymetrix 500K Sets based on the TWN population are shown in Figure 3 and Figure 4.


SAQC: SNP array quality control.

Yang HC, Lin HC, Kang M, Chen CH, Lin CW, Li LH, Wu JY, Chen YT, Pan WH - BMC Bioinformatics (2011)

Detection rate of quality indices in the simulation study based on the Affymetrix 500K SNP arrays. Averages and standard deviations of detection rates of the genotype-based and nearest-mean-based quality indices {Q1(ρ), Q2(ρ), ρ = 95%, 97.5%, 99%} for a relative experimental error r of 0-60%. The data were generated from the TWN population.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3101186&req=5

Figure 4: Detection rate of quality indices in the simulation study based on the Affymetrix 500K SNP arrays. Averages and standard deviations of detection rates of the genotype-based and nearest-mean-based quality indices {Q1(ρ), Q2(ρ), ρ = 95%, 97.5%, 99%} for a relative experimental error r of 0-60%. The data were generated from the TWN population.
Mentions: We defined detection rate as a proportion of poor-quality SNP arrays detected by the proposed confidence interval method according to a 95%, 97.5%, or 99% quantile of quality index. We calculated the mean and standard deviation of detection rates of 1,000 simulations at a relative experimental error (r) of 0-0.6 at increments 0.025. Results of the Affymetrix 100K and Affymetrix 500K Sets based on the TWN population are shown in Figure 3 and Figure 4.

Bottom Line: This study introduces new quality indices, establishes references for AFs and quality indices, and develops a detection method for poor-quality SNP arrays and/or DNA samples.We have developed a new computer program that utilizes these methods called SNP Array Quality Control (SAQC).SAQC software is written in R and R-GUI and was developed as a user-friendly tool for the visualization and evaluation of data quality of genome-wide SNP arrays.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Statistical Science, Academia Sinica, Taipei, Taiwan. hsinchou@stat.sinica.edu.tw

ABSTRACT

Background: Genome-wide single-nucleotide polymorphism (SNP) arrays containing hundreds of thousands of SNPs from the human genome have proven useful for studying important human genome questions. Data quality of SNP arrays plays a key role in the accuracy and precision of downstream data analyses. However, good indices for assessing data quality of SNP arrays have not yet been developed.

Results: We developed new quality indices to measure the quality of SNP arrays and/or DNA samples and investigated their statistical properties. The indices quantify a departure of estimated individual-level allele frequencies (AFs) from expected frequencies via standardized distances. The proposed quality indices followed lognormal distributions in several large genomic studies that we empirically evaluated. AF reference data and quality index reference data for different SNP array platforms were established based on samples from various reference populations. Furthermore, a confidence interval method based on the underlying empirical distributions of quality indices was developed to identify poor-quality SNP arrays and/or DNA samples. Analyses of authentic biological data and simulated data show that this new method is sensitive and specific for the detection of poor-quality SNP arrays and/or DNA samples.

Conclusions: This study introduces new quality indices, establishes references for AFs and quality indices, and develops a detection method for poor-quality SNP arrays and/or DNA samples. We have developed a new computer program that utilizes these methods called SNP Array Quality Control (SAQC). SAQC software is written in R and R-GUI and was developed as a user-friendly tool for the visualization and evaluation of data quality of genome-wide SNP arrays. The program is available online (http://www.stat.sinica.edu.tw/hsinchou/genetics/quality/SAQC.htm).

Show MeSH