Limits...
Validation and extension of an empirical Bayes method for SNP calling on Affymetrix microarrays.

Lin S, Carvalho B, Cutler DJ, Arking DE, Chakravarti A, Irizarry RA - Genome Biol. (2008)

Bottom Line: We find CRLMM to be more accurate than the Affymetrix default programs (BRLMM and Birdseed).Also, we tie our call confidence metric to percent accuracy.We intend that our validation datasets and methods, refered to as SNPaffycomp, serve as standard benchmarks for future SNP calling algorithms.

View Article: PubMed Central - HTML - PubMed

Affiliation: McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, N. Broadway, Baltimore, MD 21205, USA.

ABSTRACT
Multiple algorithms have been developed for the purpose of calling single nucleotide polymorphisms (SNPs) from Affymetrix microarrays. We extend and validate the algorithm CRLMM, which incorporates HapMap information within an empirical Bayes framework. We find CRLMM to be more accurate than the Affymetrix default programs (BRLMM and Birdseed). Also, we tie our call confidence metric to percent accuracy. We intend that our validation datasets and methods, refered to as SNPaffycomp, serve as standard benchmarks for future SNP calling algorithms.

Show MeSH
Accuracy prediction plots for Affymetrix first pass Sty HapMap samples. (a) A histogram of the BRLMM confidence measure is plotted for a sample chip with an average accuracy lower than 33% called by either BRLMM or CRLMM. (b) The graph shows a scatter plot of average accuracy of chips as called by BRLMM versus SNR. The y-axis is in the logit scale; the x-axis, the log scale.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2643934&req=5

Figure 2: Accuracy prediction plots for Affymetrix first pass Sty HapMap samples. (a) A histogram of the BRLMM confidence measure is plotted for a sample chip with an average accuracy lower than 33% called by either BRLMM or CRLMM. (b) The graph shows a scatter plot of average accuracy of chips as called by BRLMM versus SNR. The y-axis is in the logit scale; the x-axis, the log scale.

Mentions: CRLMM allows for the identification of these poor quality chips. It is well known that the inclusion of poor quality chips in a dataset may distort calling algorithms to such a degree that mistaken calls are made even on high quality chips. Therefore the identification and exclusion of poor quality chips is vital in any analysis. In this regard, BRLMM proves to be inadequate; using a summary statistic based on BRLMM confidence metrics will not accurately reflect the chip quality. As an example, consider one of the samples in the Affymetrix first pass Sty data; measured against the HapMap as the gold standard, it has an average accuracy less than 33% whether it is called by BRLMM or CRLMM. This degree of accuracy can be achieved by guessing, which implies that no information is provided by the array. Yet, Figure 2a demonstrates that BRLMM calls 10,000 SNPs at a very high confidence level (confidence measure >0.95). The implication is that the BRLMM confidence measure cannot be used to gauge the overall quality of a chip, because its meaning is distorted for poor quality chips; in fact, Affymetrix suggests the use of DM to exclude poor quality chips before applying BRLMM. On the other hand, the signal to noise ratio (SNR) measure we have developed (see Materials and methods) is an excellent predictor of chip-specific accuracy (Figure 2b).


Validation and extension of an empirical Bayes method for SNP calling on Affymetrix microarrays.

Lin S, Carvalho B, Cutler DJ, Arking DE, Chakravarti A, Irizarry RA - Genome Biol. (2008)

Accuracy prediction plots for Affymetrix first pass Sty HapMap samples. (a) A histogram of the BRLMM confidence measure is plotted for a sample chip with an average accuracy lower than 33% called by either BRLMM or CRLMM. (b) The graph shows a scatter plot of average accuracy of chips as called by BRLMM versus SNR. The y-axis is in the logit scale; the x-axis, the log scale.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2643934&req=5

Figure 2: Accuracy prediction plots for Affymetrix first pass Sty HapMap samples. (a) A histogram of the BRLMM confidence measure is plotted for a sample chip with an average accuracy lower than 33% called by either BRLMM or CRLMM. (b) The graph shows a scatter plot of average accuracy of chips as called by BRLMM versus SNR. The y-axis is in the logit scale; the x-axis, the log scale.
Mentions: CRLMM allows for the identification of these poor quality chips. It is well known that the inclusion of poor quality chips in a dataset may distort calling algorithms to such a degree that mistaken calls are made even on high quality chips. Therefore the identification and exclusion of poor quality chips is vital in any analysis. In this regard, BRLMM proves to be inadequate; using a summary statistic based on BRLMM confidence metrics will not accurately reflect the chip quality. As an example, consider one of the samples in the Affymetrix first pass Sty data; measured against the HapMap as the gold standard, it has an average accuracy less than 33% whether it is called by BRLMM or CRLMM. This degree of accuracy can be achieved by guessing, which implies that no information is provided by the array. Yet, Figure 2a demonstrates that BRLMM calls 10,000 SNPs at a very high confidence level (confidence measure >0.95). The implication is that the BRLMM confidence measure cannot be used to gauge the overall quality of a chip, because its meaning is distorted for poor quality chips; in fact, Affymetrix suggests the use of DM to exclude poor quality chips before applying BRLMM. On the other hand, the signal to noise ratio (SNR) measure we have developed (see Materials and methods) is an excellent predictor of chip-specific accuracy (Figure 2b).

Bottom Line: We find CRLMM to be more accurate than the Affymetrix default programs (BRLMM and Birdseed).Also, we tie our call confidence metric to percent accuracy.We intend that our validation datasets and methods, refered to as SNPaffycomp, serve as standard benchmarks for future SNP calling algorithms.

View Article: PubMed Central - HTML - PubMed

Affiliation: McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, N. Broadway, Baltimore, MD 21205, USA.

ABSTRACT
Multiple algorithms have been developed for the purpose of calling single nucleotide polymorphisms (SNPs) from Affymetrix microarrays. We extend and validate the algorithm CRLMM, which incorporates HapMap information within an empirical Bayes framework. We find CRLMM to be more accurate than the Affymetrix default programs (BRLMM and Birdseed). Also, we tie our call confidence metric to percent accuracy. We intend that our validation datasets and methods, refered to as SNPaffycomp, serve as standard benchmarks for future SNP calling algorithms.

Show MeSH