Limits...
A bioinformatic filter for improved base-call accuracy and polymorphism detection using the Affymetrix GeneChip whole-genome resequencing platform.

Pandya GA, Holmes MH, Sunkara S, Sparks A, Bai Y, Verratti K, Saeed K, Venepally P, Jarrahi B, Fleischmann RD, Peterson SN - Nucleic Acids Res. (2007)

Bottom Line: While this specificity in the target DNA population reduces the potential for artifacts caused by cross-hybridization, the subsampling of the query genome limits the sequence coverage that can be obtained and therefore reduces the technique's resolution as a genotyping method.A set of bioinformatic filters that targeted systematic base-calling errors caused by cross-hybridization between the whole-genome sample and the array probes and by deletions in the sample DNA relative to the chip reference sequence were developed.Our approach eliminated 91% of the false-positive single-nucleotide polymorphism calls identified in the SCHU S4 query sample, at the cost of 10.7% of the true positives, yielding a total base-calling accuracy of 99.992%.

View Article: PubMed Central - PubMed

Affiliation: Pathogen Functional Genomics Resource Center, The Institute for Genomic Research at the J. Craig Venter Institute, Rockville, MD 20850, USA.

ABSTRACT
DNA resequencing arrays enable rapid acquisition of high-quality sequence data. This technology represents a promising platform for rapid high-resolution genotyping of microorganisms. Traditional array-based resequencing methods have relied on the use of specific PCR-amplified fragments from the query samples as hybridization targets. While this specificity in the target DNA population reduces the potential for artifacts caused by cross-hybridization, the subsampling of the query genome limits the sequence coverage that can be obtained and therefore reduces the technique's resolution as a genotyping method. We have developed and validated an Affymetrix Inc. GeneChip(R) array-based, whole-genome resequencing platform for Francisella tularensis, the causative agent of tularemia. A set of bioinformatic filters that targeted systematic base-calling errors caused by cross-hybridization between the whole-genome sample and the array probes and by deletions in the sample DNA relative to the chip reference sequence were developed. Our approach eliminated 91% of the false-positive single-nucleotide polymorphism calls identified in the SCHU S4 query sample, at the cost of 10.7% of the true positives, yielding a total base-calling accuracy of 99.992%.

Show MeSH

Related in: MedlinePlus

ROC curve illustrating the effect of different quality threshold values on the true positive and false positive rates. The GSEQ quality score threshold was set to 3.0, and our quality filter was applied using different threshold values shown on the line graph.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2175352&req=5

Figure 3: ROC curve illustrating the effect of different quality threshold values on the true positive and false positive rates. The GSEQ quality score threshold was set to 3.0, and our quality filter was applied using different threshold values shown on the line graph.

Mentions: The next filter in our pipeline is a quality filter that simply eliminates SNP calls that have been assigned low quality scores by the GSEQ software. The quality score is based on the difference in signal intensity between the highest intensity probe pair and the next highest intensity pair at a particular locus (4), so calls with low quality scores are more likely to be incorrect than high-scoring calls. We have found that filtering out SNP calls with quality scores less than 12.0 removes a large number of false positives, at a relatively small cost in terms of true positives rejected. A receiver operating characteristic (ROC) curve that illustrates the effect of different quality threshold values is shown in Figure 3. (For the analysis in Figure 3 only, we used our own quality filter in preference to the quality filter in the GSEQ software, so that we could easily test the effect of different quality thresholds. For all other analyses, the quality filter incorporated in GSEQ was used. The GSEQ software is run before our filters, so the quality filter was actually the first filter applied, except in the case of Figure 3.)Figure 3.


A bioinformatic filter for improved base-call accuracy and polymorphism detection using the Affymetrix GeneChip whole-genome resequencing platform.

Pandya GA, Holmes MH, Sunkara S, Sparks A, Bai Y, Verratti K, Saeed K, Venepally P, Jarrahi B, Fleischmann RD, Peterson SN - Nucleic Acids Res. (2007)

ROC curve illustrating the effect of different quality threshold values on the true positive and false positive rates. The GSEQ quality score threshold was set to 3.0, and our quality filter was applied using different threshold values shown on the line graph.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2175352&req=5

Figure 3: ROC curve illustrating the effect of different quality threshold values on the true positive and false positive rates. The GSEQ quality score threshold was set to 3.0, and our quality filter was applied using different threshold values shown on the line graph.
Mentions: The next filter in our pipeline is a quality filter that simply eliminates SNP calls that have been assigned low quality scores by the GSEQ software. The quality score is based on the difference in signal intensity between the highest intensity probe pair and the next highest intensity pair at a particular locus (4), so calls with low quality scores are more likely to be incorrect than high-scoring calls. We have found that filtering out SNP calls with quality scores less than 12.0 removes a large number of false positives, at a relatively small cost in terms of true positives rejected. A receiver operating characteristic (ROC) curve that illustrates the effect of different quality threshold values is shown in Figure 3. (For the analysis in Figure 3 only, we used our own quality filter in preference to the quality filter in the GSEQ software, so that we could easily test the effect of different quality thresholds. For all other analyses, the quality filter incorporated in GSEQ was used. The GSEQ software is run before our filters, so the quality filter was actually the first filter applied, except in the case of Figure 3.)Figure 3.

Bottom Line: While this specificity in the target DNA population reduces the potential for artifacts caused by cross-hybridization, the subsampling of the query genome limits the sequence coverage that can be obtained and therefore reduces the technique's resolution as a genotyping method.A set of bioinformatic filters that targeted systematic base-calling errors caused by cross-hybridization between the whole-genome sample and the array probes and by deletions in the sample DNA relative to the chip reference sequence were developed.Our approach eliminated 91% of the false-positive single-nucleotide polymorphism calls identified in the SCHU S4 query sample, at the cost of 10.7% of the true positives, yielding a total base-calling accuracy of 99.992%.

View Article: PubMed Central - PubMed

Affiliation: Pathogen Functional Genomics Resource Center, The Institute for Genomic Research at the J. Craig Venter Institute, Rockville, MD 20850, USA.

ABSTRACT
DNA resequencing arrays enable rapid acquisition of high-quality sequence data. This technology represents a promising platform for rapid high-resolution genotyping of microorganisms. Traditional array-based resequencing methods have relied on the use of specific PCR-amplified fragments from the query samples as hybridization targets. While this specificity in the target DNA population reduces the potential for artifacts caused by cross-hybridization, the subsampling of the query genome limits the sequence coverage that can be obtained and therefore reduces the technique's resolution as a genotyping method. We have developed and validated an Affymetrix Inc. GeneChip(R) array-based, whole-genome resequencing platform for Francisella tularensis, the causative agent of tularemia. A set of bioinformatic filters that targeted systematic base-calling errors caused by cross-hybridization between the whole-genome sample and the array probes and by deletions in the sample DNA relative to the chip reference sequence were developed. Our approach eliminated 91% of the false-positive single-nucleotide polymorphism calls identified in the SCHU S4 query sample, at the cost of 10.7% of the true positives, yielding a total base-calling accuracy of 99.992%.

Show MeSH
Related in: MedlinePlus