Limits...
Establishing a major cause of discrepancy in the calibration of Affymetrix GeneChips.

Harrison AP, Johnston CE, Orengo CA - BMC Bioinformatics (2007)

Bottom Line: We wished to establish which of the calibration steps resulted in the biggest uncertainty in the sets of genes reported to be differentially expressed.Our results indicate that the sets of genes identified as being most significantly differentially expressed, as estimated by the z-score of fold change, is relatively insensitive to the choice of background subtraction and normalisation.We also reach the same conclusion after assigning genes to be differentially expressed using t-statistics, although this approach results in a large amount of false positives in the sets of genes identified due to the small numbers of replicates typically used in microarray experiments.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Sciences, University of Essex, Wivenhoe Park, Colchester, Essex, UK. harry@essex.ac.uk

ABSTRACT

Background: Affymetrix GeneChips are a popular platform for performing whole-genome experiments on the transcriptome. There are a range of different calibration steps, and users are presented with choices of different background subtractions, normalisations and expression measures. We wished to establish which of the calibration steps resulted in the biggest uncertainty in the sets of genes reported to be differentially expressed.

Results: Our results indicate that the sets of genes identified as being most significantly differentially expressed, as estimated by the z-score of fold change, is relatively insensitive to the choice of background subtraction and normalisation. However, the contents of the gene list are most sensitive to the choice of expression measure. This is irrespective of whether the experiment uses a rat, mouse or human chip and whether the chip definition is made using probe mappings from Unigene, RefSeq, Entrez Gene or the original Affymetrix definitions. It is also irrespective of whether both Present and Absent, or just Present, Calls from the MAS5 algorithm are used to filter genelists, and this conclusion holds for genes of differing intensities. We also reach the same conclusion after assigning genes to be differentially expressed using t-statistics, although this approach results in a large amount of false positives in the sets of genes identified due to the small numbers of replicates typically used in microarray experiments.

Conclusion: The major calibration uncertainty that biologists need to consider when analysing Affymetrix data is how their multiple probe values are condensed into one expression measure.

Show MeSH

Related in: MedlinePlus

Matrices of consensus in the top 400 most significant changing genes (Z-score) in the 45 different protocols : (a) GSE1004; (b) GSE1703; (c) GSE1873; (d) GSE2401; (e) GSE2535.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1904248&req=5

Figure 3: Matrices of consensus in the top 400 most significant changing genes (Z-score) in the 45 different protocols : (a) GSE1004; (b) GSE1703; (c) GSE1873; (d) GSE2401; (e) GSE2535.

Mentions: Figure 3 shows matrices of comparisons for experiments GSE1004 [9], GSE1703 [13], GSE1873 [14], GSE2401 [15] and GSE2535 [16]. We find that differences in the expression measure is the dominant cause of disagreements in the gene lists between different protocols, irrespective of whether the experiment uses a human, rat or mouse chip, and irrespective of the number of biological replicates for each condition. If two calibration protocols share a common expression measure then the average percent overlap in their gene lists is 83% for GSE1703, 77% for GSE1873, 82% for GSE2401 and 85% for GSE2535.


Establishing a major cause of discrepancy in the calibration of Affymetrix GeneChips.

Harrison AP, Johnston CE, Orengo CA - BMC Bioinformatics (2007)

Matrices of consensus in the top 400 most significant changing genes (Z-score) in the 45 different protocols : (a) GSE1004; (b) GSE1703; (c) GSE1873; (d) GSE2401; (e) GSE2535.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1904248&req=5

Figure 3: Matrices of consensus in the top 400 most significant changing genes (Z-score) in the 45 different protocols : (a) GSE1004; (b) GSE1703; (c) GSE1873; (d) GSE2401; (e) GSE2535.
Mentions: Figure 3 shows matrices of comparisons for experiments GSE1004 [9], GSE1703 [13], GSE1873 [14], GSE2401 [15] and GSE2535 [16]. We find that differences in the expression measure is the dominant cause of disagreements in the gene lists between different protocols, irrespective of whether the experiment uses a human, rat or mouse chip, and irrespective of the number of biological replicates for each condition. If two calibration protocols share a common expression measure then the average percent overlap in their gene lists is 83% for GSE1703, 77% for GSE1873, 82% for GSE2401 and 85% for GSE2535.

Bottom Line: We wished to establish which of the calibration steps resulted in the biggest uncertainty in the sets of genes reported to be differentially expressed.Our results indicate that the sets of genes identified as being most significantly differentially expressed, as estimated by the z-score of fold change, is relatively insensitive to the choice of background subtraction and normalisation.We also reach the same conclusion after assigning genes to be differentially expressed using t-statistics, although this approach results in a large amount of false positives in the sets of genes identified due to the small numbers of replicates typically used in microarray experiments.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Sciences, University of Essex, Wivenhoe Park, Colchester, Essex, UK. harry@essex.ac.uk

ABSTRACT

Background: Affymetrix GeneChips are a popular platform for performing whole-genome experiments on the transcriptome. There are a range of different calibration steps, and users are presented with choices of different background subtractions, normalisations and expression measures. We wished to establish which of the calibration steps resulted in the biggest uncertainty in the sets of genes reported to be differentially expressed.

Results: Our results indicate that the sets of genes identified as being most significantly differentially expressed, as estimated by the z-score of fold change, is relatively insensitive to the choice of background subtraction and normalisation. However, the contents of the gene list are most sensitive to the choice of expression measure. This is irrespective of whether the experiment uses a rat, mouse or human chip and whether the chip definition is made using probe mappings from Unigene, RefSeq, Entrez Gene or the original Affymetrix definitions. It is also irrespective of whether both Present and Absent, or just Present, Calls from the MAS5 algorithm are used to filter genelists, and this conclusion holds for genes of differing intensities. We also reach the same conclusion after assigning genes to be differentially expressed using t-statistics, although this approach results in a large amount of false positives in the sets of genes identified due to the small numbers of replicates typically used in microarray experiments.

Conclusion: The major calibration uncertainty that biologists need to consider when analysing Affymetrix data is how their multiple probe values are condensed into one expression measure.

Show MeSH
Related in: MedlinePlus