Limits...
A bioinformatic filter for improved base-call accuracy and polymorphism detection using the Affymetrix GeneChip whole-genome resequencing platform.

Pandya GA, Holmes MH, Sunkara S, Sparks A, Bai Y, Verratti K, Saeed K, Venepally P, Jarrahi B, Fleischmann RD, Peterson SN - Nucleic Acids Res. (2007)

Bottom Line: While this specificity in the target DNA population reduces the potential for artifacts caused by cross-hybridization, the subsampling of the query genome limits the sequence coverage that can be obtained and therefore reduces the technique's resolution as a genotyping method.A set of bioinformatic filters that targeted systematic base-calling errors caused by cross-hybridization between the whole-genome sample and the array probes and by deletions in the sample DNA relative to the chip reference sequence were developed.Our approach eliminated 91% of the false-positive single-nucleotide polymorphism calls identified in the SCHU S4 query sample, at the cost of 10.7% of the true positives, yielding a total base-calling accuracy of 99.992%.

View Article: PubMed Central - PubMed

Affiliation: Pathogen Functional Genomics Resource Center, The Institute for Genomic Research at the J. Craig Venter Institute, Rockville, MD 20850, USA.

ABSTRACT
DNA resequencing arrays enable rapid acquisition of high-quality sequence data. This technology represents a promising platform for rapid high-resolution genotyping of microorganisms. Traditional array-based resequencing methods have relied on the use of specific PCR-amplified fragments from the query samples as hybridization targets. While this specificity in the target DNA population reduces the potential for artifacts caused by cross-hybridization, the subsampling of the query genome limits the sequence coverage that can be obtained and therefore reduces the technique's resolution as a genotyping method. We have developed and validated an Affymetrix Inc. GeneChip(R) array-based, whole-genome resequencing platform for Francisella tularensis, the causative agent of tularemia. A set of bioinformatic filters that targeted systematic base-calling errors caused by cross-hybridization between the whole-genome sample and the array probes and by deletions in the sample DNA relative to the chip reference sequence were developed. Our approach eliminated 91% of the false-positive single-nucleotide polymorphism calls identified in the SCHU S4 query sample, at the cost of 10.7% of the true positives, yielding a total base-calling accuracy of 99.992%.

Show MeSH

Related in: MedlinePlus

Schematic representation of whole genome resequencing array set design. Blue vertical lines indicate repeats in the genomes. Unique sequences for LVS and SCHU S4 are shown as red and green vertical lines, respectively. Similarly, yellow and purple vertical lines represent unique sequences from plasmids pOM1 and pFNL10, respectively.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2175352&req=5

Figure 5: Schematic representation of whole genome resequencing array set design. Blue vertical lines indicate repeats in the genomes. Unique sequences for LVS and SCHU S4 are shown as red and green vertical lines, respectively. Similarly, yellow and purple vertical lines represent unique sequences from plasmids pOM1 and pFNL10, respectively.

Mentions: A schematic representation of the chip design used to represent the F. tularensis genome is depicted in Figure 5. The F. tularensis LVS genome sequence defined our reference and was represented on chips A–E and the majority of chip F. Unique sequences present in strain SCHU S4, together with two plasmid sequences, were added to the remainder of chip F. Our chip design, based on sequence information from two strains, enables coverage of a large number of strains. Approximately 91% of the F. tularensis double-stranded unique genome can be resequenced with this design from strains belonging to holarctica (type B) and tularensis (type A) subtypes.Figure 5.


A bioinformatic filter for improved base-call accuracy and polymorphism detection using the Affymetrix GeneChip whole-genome resequencing platform.

Pandya GA, Holmes MH, Sunkara S, Sparks A, Bai Y, Verratti K, Saeed K, Venepally P, Jarrahi B, Fleischmann RD, Peterson SN - Nucleic Acids Res. (2007)

Schematic representation of whole genome resequencing array set design. Blue vertical lines indicate repeats in the genomes. Unique sequences for LVS and SCHU S4 are shown as red and green vertical lines, respectively. Similarly, yellow and purple vertical lines represent unique sequences from plasmids pOM1 and pFNL10, respectively.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2175352&req=5

Figure 5: Schematic representation of whole genome resequencing array set design. Blue vertical lines indicate repeats in the genomes. Unique sequences for LVS and SCHU S4 are shown as red and green vertical lines, respectively. Similarly, yellow and purple vertical lines represent unique sequences from plasmids pOM1 and pFNL10, respectively.
Mentions: A schematic representation of the chip design used to represent the F. tularensis genome is depicted in Figure 5. The F. tularensis LVS genome sequence defined our reference and was represented on chips A–E and the majority of chip F. Unique sequences present in strain SCHU S4, together with two plasmid sequences, were added to the remainder of chip F. Our chip design, based on sequence information from two strains, enables coverage of a large number of strains. Approximately 91% of the F. tularensis double-stranded unique genome can be resequenced with this design from strains belonging to holarctica (type B) and tularensis (type A) subtypes.Figure 5.

Bottom Line: While this specificity in the target DNA population reduces the potential for artifacts caused by cross-hybridization, the subsampling of the query genome limits the sequence coverage that can be obtained and therefore reduces the technique's resolution as a genotyping method.A set of bioinformatic filters that targeted systematic base-calling errors caused by cross-hybridization between the whole-genome sample and the array probes and by deletions in the sample DNA relative to the chip reference sequence were developed.Our approach eliminated 91% of the false-positive single-nucleotide polymorphism calls identified in the SCHU S4 query sample, at the cost of 10.7% of the true positives, yielding a total base-calling accuracy of 99.992%.

View Article: PubMed Central - PubMed

Affiliation: Pathogen Functional Genomics Resource Center, The Institute for Genomic Research at the J. Craig Venter Institute, Rockville, MD 20850, USA.

ABSTRACT
DNA resequencing arrays enable rapid acquisition of high-quality sequence data. This technology represents a promising platform for rapid high-resolution genotyping of microorganisms. Traditional array-based resequencing methods have relied on the use of specific PCR-amplified fragments from the query samples as hybridization targets. While this specificity in the target DNA population reduces the potential for artifacts caused by cross-hybridization, the subsampling of the query genome limits the sequence coverage that can be obtained and therefore reduces the technique's resolution as a genotyping method. We have developed and validated an Affymetrix Inc. GeneChip(R) array-based, whole-genome resequencing platform for Francisella tularensis, the causative agent of tularemia. A set of bioinformatic filters that targeted systematic base-calling errors caused by cross-hybridization between the whole-genome sample and the array probes and by deletions in the sample DNA relative to the chip reference sequence were developed. Our approach eliminated 91% of the false-positive single-nucleotide polymorphism calls identified in the SCHU S4 query sample, at the cost of 10.7% of the true positives, yielding a total base-calling accuracy of 99.992%.

Show MeSH
Related in: MedlinePlus