Limits...
Normalization of Illumina Infinium whole-genome SNP data improves copy number estimates and allelic intensity ratios.

Staaf J, Vallon-Christersson J, Lindgren D, Juliusson G, Rosenquist R, Höglund M, Borg A, Ringnér M - BMC Bioinformatics (2008)

Bottom Line: We show that the proposed normalization strategy successfully removes asymmetry in estimates of both allelic proportions and copy numbers.Additionally, the normalization strategy reduces the technical variation for copy number estimates while retaining the response to copy number alterations.The proposed normalization strategy represents a valuable tool that improves the quality of data obtained from Illumina Infinium arrays, in particular when used for LOH and copy number variation studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Oncology, Clinical Sciences, Lund University, SE-22185 Lund, Sweden. johan.staaf@med.lu.se

ABSTRACT

Background: Illumina Infinium whole genome genotyping (WGG) arrays are increasingly being applied in cancer genomics to study gene copy number alterations and allele-specific aberrations such as loss-of-heterozygosity (LOH). Methods developed for normalization of WGG arrays have mostly focused on diploid, normal samples. However, for cancer samples genomic aberrations may confound normalization and data interpretation. Therefore, we examined the effects of the conventionally used normalization method for Illumina Infinium arrays when applied to cancer samples.

Results: We demonstrate an asymmetry in the detection of the two alleles for each SNP, which deleteriously influences both allelic proportions and copy number estimates. The asymmetry is caused by a remaining bias between the two dyes used in the Infinium II assay after using the normalization method in Illumina's proprietary software (BeadStudio). We propose a quantile normalization strategy for correction of this dye bias. We tested the normalization strategy using 535 individual hybridizations from 10 data sets from the analysis of cancer genomes and normal blood samples generated on Illumina Infinium II 300 k version 1 and 2, 370 k and 550 k BeadChips. We show that the proposed normalization strategy successfully removes asymmetry in estimates of both allelic proportions and copy numbers. Additionally, the normalization strategy reduces the technical variation for copy number estimates while retaining the response to copy number alterations.

Conclusion: The proposed normalization strategy represents a valuable tool that improves the quality of data obtained from Illumina Infinium arrays, in particular when used for LOH and copy number variation studies.

Show MeSH

Related in: MedlinePlus

Intensity transformations of X and Y by quantile normalization. HapMap sample NA06985 hybridized on an Infinium 370 k BeadChip is shown. SNPs have been colored based on individual genotype calls: AA (green), AB (yellow), and BB (red). SNPs without genotype call are excluded. (a) Scatter plot of BeadStudio allele intensities X and Y. A lowess regression line for heterozygous SNPs is superimposed (solid) together with the expected X = Y line (dashed) illustrating that the dye intensity bias affects heterozygous SNPs. (b) MR plot of BeadStudio allele intensities for chromosome 8 with superimposed lowess regression lines (solid) for each genotype population and locally fitted linear regression lines (dashed blue). The mean M value for each genotype population is indicated by horizontally dashed black lines. (c) MR plot of quantile normalized allele intensities for chromosome 8 with superimposed lowess regression lines (solid black) and locally fitted linear regression lines (dashed blue) for each genotype population, separately. (d) Scatter plot of the intensity transformation XQN/X vs X from quantile normalization. SNPs are colored by genotype. SNPs with low X intensity values (predominantly genotyped as BB) are increased significantly in intensity by QN. (e) Scatter plot of the intensity transformation YQN/Y vs Y from quantile normalization. SNPs are colored by genotype. (f) Histogram of BeadStudio X intensities. (g) Histogram of BeadStudioY intensities.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2572624&req=5

Figure 2: Intensity transformations of X and Y by quantile normalization. HapMap sample NA06985 hybridized on an Infinium 370 k BeadChip is shown. SNPs have been colored based on individual genotype calls: AA (green), AB (yellow), and BB (red). SNPs without genotype call are excluded. (a) Scatter plot of BeadStudio allele intensities X and Y. A lowess regression line for heterozygous SNPs is superimposed (solid) together with the expected X = Y line (dashed) illustrating that the dye intensity bias affects heterozygous SNPs. (b) MR plot of BeadStudio allele intensities for chromosome 8 with superimposed lowess regression lines (solid) for each genotype population and locally fitted linear regression lines (dashed blue). The mean M value for each genotype population is indicated by horizontally dashed black lines. (c) MR plot of quantile normalized allele intensities for chromosome 8 with superimposed lowess regression lines (solid black) and locally fitted linear regression lines (dashed blue) for each genotype population, separately. (d) Scatter plot of the intensity transformation XQN/X vs X from quantile normalization. SNPs are colored by genotype. SNPs with low X intensity values (predominantly genotyped as BB) are increased significantly in intensity by QN. (e) Scatter plot of the intensity transformation YQN/Y vs Y from quantile normalization. SNPs are colored by genotype. (f) Histogram of BeadStudio X intensities. (g) Histogram of BeadStudioY intensities.

Mentions: The deviation from theta = 0.5 for heterozygous SNPs in HapMap samples indicates that an imbalance in the X and Y intensity distributions remains after QN (Table 2). The imbalance in theta affects BAF estimates through the calibration of theta into BAF using the HapMap reference genotype clusters. Part of the imbalance can be explained by an uncorrected curvature between X and Y intensities that prior to QN is present for both tumor samples (Figure 1e) and HapMap samples (Figure 2a). To investigate the relationship between allelic intensity ratios and overall intensity we created MR plots where M = log2(Y/X) and R = log10(X + Y) similar to conventional MA plots [16]. Consequently, in MR plots heterozygote SNPs should have an M value of 0. As expected from figure 2a, curvature is present prior to QN in the MR plot of HapMap sample NA06985 for the AB, BB and AA SNP populations (Figure 2b). The curvature is highlighted by the superimposed lowess curve for each genotype population and the slope of a fitted linear regression line through each population. After QN there is less curvature, although not fully removed (Figure 2c).


Normalization of Illumina Infinium whole-genome SNP data improves copy number estimates and allelic intensity ratios.

Staaf J, Vallon-Christersson J, Lindgren D, Juliusson G, Rosenquist R, Höglund M, Borg A, Ringnér M - BMC Bioinformatics (2008)

Intensity transformations of X and Y by quantile normalization. HapMap sample NA06985 hybridized on an Infinium 370 k BeadChip is shown. SNPs have been colored based on individual genotype calls: AA (green), AB (yellow), and BB (red). SNPs without genotype call are excluded. (a) Scatter plot of BeadStudio allele intensities X and Y. A lowess regression line for heterozygous SNPs is superimposed (solid) together with the expected X = Y line (dashed) illustrating that the dye intensity bias affects heterozygous SNPs. (b) MR plot of BeadStudio allele intensities for chromosome 8 with superimposed lowess regression lines (solid) for each genotype population and locally fitted linear regression lines (dashed blue). The mean M value for each genotype population is indicated by horizontally dashed black lines. (c) MR plot of quantile normalized allele intensities for chromosome 8 with superimposed lowess regression lines (solid black) and locally fitted linear regression lines (dashed blue) for each genotype population, separately. (d) Scatter plot of the intensity transformation XQN/X vs X from quantile normalization. SNPs are colored by genotype. SNPs with low X intensity values (predominantly genotyped as BB) are increased significantly in intensity by QN. (e) Scatter plot of the intensity transformation YQN/Y vs Y from quantile normalization. SNPs are colored by genotype. (f) Histogram of BeadStudio X intensities. (g) Histogram of BeadStudioY intensities.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2572624&req=5

Figure 2: Intensity transformations of X and Y by quantile normalization. HapMap sample NA06985 hybridized on an Infinium 370 k BeadChip is shown. SNPs have been colored based on individual genotype calls: AA (green), AB (yellow), and BB (red). SNPs without genotype call are excluded. (a) Scatter plot of BeadStudio allele intensities X and Y. A lowess regression line for heterozygous SNPs is superimposed (solid) together with the expected X = Y line (dashed) illustrating that the dye intensity bias affects heterozygous SNPs. (b) MR plot of BeadStudio allele intensities for chromosome 8 with superimposed lowess regression lines (solid) for each genotype population and locally fitted linear regression lines (dashed blue). The mean M value for each genotype population is indicated by horizontally dashed black lines. (c) MR plot of quantile normalized allele intensities for chromosome 8 with superimposed lowess regression lines (solid black) and locally fitted linear regression lines (dashed blue) for each genotype population, separately. (d) Scatter plot of the intensity transformation XQN/X vs X from quantile normalization. SNPs are colored by genotype. SNPs with low X intensity values (predominantly genotyped as BB) are increased significantly in intensity by QN. (e) Scatter plot of the intensity transformation YQN/Y vs Y from quantile normalization. SNPs are colored by genotype. (f) Histogram of BeadStudio X intensities. (g) Histogram of BeadStudioY intensities.
Mentions: The deviation from theta = 0.5 for heterozygous SNPs in HapMap samples indicates that an imbalance in the X and Y intensity distributions remains after QN (Table 2). The imbalance in theta affects BAF estimates through the calibration of theta into BAF using the HapMap reference genotype clusters. Part of the imbalance can be explained by an uncorrected curvature between X and Y intensities that prior to QN is present for both tumor samples (Figure 1e) and HapMap samples (Figure 2a). To investigate the relationship between allelic intensity ratios and overall intensity we created MR plots where M = log2(Y/X) and R = log10(X + Y) similar to conventional MA plots [16]. Consequently, in MR plots heterozygote SNPs should have an M value of 0. As expected from figure 2a, curvature is present prior to QN in the MR plot of HapMap sample NA06985 for the AB, BB and AA SNP populations (Figure 2b). The curvature is highlighted by the superimposed lowess curve for each genotype population and the slope of a fitted linear regression line through each population. After QN there is less curvature, although not fully removed (Figure 2c).

Bottom Line: We show that the proposed normalization strategy successfully removes asymmetry in estimates of both allelic proportions and copy numbers.Additionally, the normalization strategy reduces the technical variation for copy number estimates while retaining the response to copy number alterations.The proposed normalization strategy represents a valuable tool that improves the quality of data obtained from Illumina Infinium arrays, in particular when used for LOH and copy number variation studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Oncology, Clinical Sciences, Lund University, SE-22185 Lund, Sweden. johan.staaf@med.lu.se

ABSTRACT

Background: Illumina Infinium whole genome genotyping (WGG) arrays are increasingly being applied in cancer genomics to study gene copy number alterations and allele-specific aberrations such as loss-of-heterozygosity (LOH). Methods developed for normalization of WGG arrays have mostly focused on diploid, normal samples. However, for cancer samples genomic aberrations may confound normalization and data interpretation. Therefore, we examined the effects of the conventionally used normalization method for Illumina Infinium arrays when applied to cancer samples.

Results: We demonstrate an asymmetry in the detection of the two alleles for each SNP, which deleteriously influences both allelic proportions and copy number estimates. The asymmetry is caused by a remaining bias between the two dyes used in the Infinium II assay after using the normalization method in Illumina's proprietary software (BeadStudio). We propose a quantile normalization strategy for correction of this dye bias. We tested the normalization strategy using 535 individual hybridizations from 10 data sets from the analysis of cancer genomes and normal blood samples generated on Illumina Infinium II 300 k version 1 and 2, 370 k and 550 k BeadChips. We show that the proposed normalization strategy successfully removes asymmetry in estimates of both allelic proportions and copy numbers. Additionally, the normalization strategy reduces the technical variation for copy number estimates while retaining the response to copy number alterations.

Conclusion: The proposed normalization strategy represents a valuable tool that improves the quality of data obtained from Illumina Infinium arrays, in particular when used for LOH and copy number variation studies.

Show MeSH
Related in: MedlinePlus