Limits...
Correcting for sequence biases in present/absent calls.

Schuster EF, Blanc E, Partridge L, Thornton JM - Genome Biol. (2007)

Bottom Line: The probe sequence of short oligonucleotides in Affymetrix microarray experiments can have a significant influence on present/absent calls of probesets with absent target transcripts.Probesets enriched for central Ts and depleted of central As in the perfect-match probes tend to be falsely classified as having present transcripts.Correction of non-specific binding for both perfect-match and mismatch probes using probe-sequence models can partially remove the probe-sequence bias and result in better performance of the MAS 5.0 algorithm.

View Article: PubMed Central - HTML - PubMed

Affiliation: European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK. schuster@ebi.ac.uk

ABSTRACT
The probe sequence of short oligonucleotides in Affymetrix microarray experiments can have a significant influence on present/absent calls of probesets with absent target transcripts. Probesets enriched for central Ts and depleted of central As in the perfect-match probes tend to be falsely classified as having present transcripts. Correction of non-specific binding for both perfect-match and mismatch probes using probe-sequence models can partially remove the probe-sequence bias and result in better performance of the MAS 5.0 algorithm.

Show MeSH

Related in: MedlinePlus

AUC performance for present/absent calls. AUC scores for 301 methods to generate probeset expression values (see Materials and methods and [8] for more information) based on the mean log2 value of each probeset for control (C) samples (rainbow colors as in legend). The performance of a method for spiked-in (S) samples (gray) is shown in the same column. True positives are probesets that can be aligned to the DGC clones that were used to create the GoldenSpike dataset. False positives are the remaining empty probesets. The horizontal lines indicate the AUC scores for the mean MAS 5.0 present/absent P value for C replicates (blue) and S replicates (gray).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2394774&req=5

Figure 3: AUC performance for present/absent calls. AUC scores for 301 methods to generate probeset expression values (see Materials and methods and [8] for more information) based on the mean log2 value of each probeset for control (C) samples (rainbow colors as in legend). The performance of a method for spiked-in (S) samples (gray) is shown in the same column. True positives are probesets that can be aligned to the DGC clones that were used to create the GoldenSpike dataset. False positives are the remaining empty probesets. The horizontal lines indicate the AUC scores for the mean MAS 5.0 present/absent P value for C replicates (blue) and S replicates (gray).

Mentions: Given that the probe sequence is important for present/absent calls and probe signal intensity, we compared 301 different methods to generate probeset expression values to determine if the expression value cutoff could be used to classify probesets. The majority of methods were based on three different methods for correction of PM values: the robust multichip average (RMA) background correction method [9], in which an estimated background signal is subtracted from all PM probes; the MAS 5.0 method, in which the MM probe intensity is subtracted from its partner PM probe to correct for non-specific binding (NSB); and the GC-RMA method [10], in which PM probe intensities are transformed based on estimates of NSB and probe sequence biases in MM probes. The background/NSB corrections were combined with eight methods for normalization at the probe level, six methods to summarize probe values into probeset values, and loess or variance stabilization normalization [11] at the probeset level (see Materials and methods and [8] for more information). Performance of a method was based on ROC curves, where the rate of finding true positives (bound probesets) is compared to the rate of finding false positives (empty probesets), and performance scores are the area under the ROC curves (AUC). We observed that methods that used probe-sequence based corrections for non-specific binding (GC-RMA [10] and position di-nucleotide nearest neighbor [12] methods) outperformed the other methods and the MAS 5.0 present/absent algorithm. We also observed that the method of background/NSB correction influences performance much more than normalization and summarization methods, and that probeset normalization has very little affect on performance (Figure 3).


Correcting for sequence biases in present/absent calls.

Schuster EF, Blanc E, Partridge L, Thornton JM - Genome Biol. (2007)

AUC performance for present/absent calls. AUC scores for 301 methods to generate probeset expression values (see Materials and methods and [8] for more information) based on the mean log2 value of each probeset for control (C) samples (rainbow colors as in legend). The performance of a method for spiked-in (S) samples (gray) is shown in the same column. True positives are probesets that can be aligned to the DGC clones that were used to create the GoldenSpike dataset. False positives are the remaining empty probesets. The horizontal lines indicate the AUC scores for the mean MAS 5.0 present/absent P value for C replicates (blue) and S replicates (gray).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2394774&req=5

Figure 3: AUC performance for present/absent calls. AUC scores for 301 methods to generate probeset expression values (see Materials and methods and [8] for more information) based on the mean log2 value of each probeset for control (C) samples (rainbow colors as in legend). The performance of a method for spiked-in (S) samples (gray) is shown in the same column. True positives are probesets that can be aligned to the DGC clones that were used to create the GoldenSpike dataset. False positives are the remaining empty probesets. The horizontal lines indicate the AUC scores for the mean MAS 5.0 present/absent P value for C replicates (blue) and S replicates (gray).
Mentions: Given that the probe sequence is important for present/absent calls and probe signal intensity, we compared 301 different methods to generate probeset expression values to determine if the expression value cutoff could be used to classify probesets. The majority of methods were based on three different methods for correction of PM values: the robust multichip average (RMA) background correction method [9], in which an estimated background signal is subtracted from all PM probes; the MAS 5.0 method, in which the MM probe intensity is subtracted from its partner PM probe to correct for non-specific binding (NSB); and the GC-RMA method [10], in which PM probe intensities are transformed based on estimates of NSB and probe sequence biases in MM probes. The background/NSB corrections were combined with eight methods for normalization at the probe level, six methods to summarize probe values into probeset values, and loess or variance stabilization normalization [11] at the probeset level (see Materials and methods and [8] for more information). Performance of a method was based on ROC curves, where the rate of finding true positives (bound probesets) is compared to the rate of finding false positives (empty probesets), and performance scores are the area under the ROC curves (AUC). We observed that methods that used probe-sequence based corrections for non-specific binding (GC-RMA [10] and position di-nucleotide nearest neighbor [12] methods) outperformed the other methods and the MAS 5.0 present/absent algorithm. We also observed that the method of background/NSB correction influences performance much more than normalization and summarization methods, and that probeset normalization has very little affect on performance (Figure 3).

Bottom Line: The probe sequence of short oligonucleotides in Affymetrix microarray experiments can have a significant influence on present/absent calls of probesets with absent target transcripts.Probesets enriched for central Ts and depleted of central As in the perfect-match probes tend to be falsely classified as having present transcripts.Correction of non-specific binding for both perfect-match and mismatch probes using probe-sequence models can partially remove the probe-sequence bias and result in better performance of the MAS 5.0 algorithm.

View Article: PubMed Central - HTML - PubMed

Affiliation: European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK. schuster@ebi.ac.uk

ABSTRACT
The probe sequence of short oligonucleotides in Affymetrix microarray experiments can have a significant influence on present/absent calls of probesets with absent target transcripts. Probesets enriched for central Ts and depleted of central As in the perfect-match probes tend to be falsely classified as having present transcripts. Correction of non-specific binding for both perfect-match and mismatch probes using probe-sequence models can partially remove the probe-sequence bias and result in better performance of the MAS 5.0 algorithm.

Show MeSH
Related in: MedlinePlus