Limits...
Supervised Lowess normalization of comparative genome hybridization data--application to lactococcal strain comparisons.

van Hijum SA, Baerends RJ, Zomer AL, Karsens HA, Martin-Requena V, Trelles O, Kok J, Kuipers OP - BMC Bioinformatics (2008)

Bottom Line: Current normalization methods do not take into consideration sequence divergence between target and microarray features and therefore cannot distinguish a difference in signal due to systematic errors in the data or due to sequence divergence.Since both genomes are sequenced and gene deletions identified, the success rate of different aCGH normalization methods in detecting these deletions in the MG1363 genome were determined.S-Lowess detects 97% of the deletions, whereas other aCGH normalization methods detect up to only 60% of the deletions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Molecular Genetics, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Kerklaan 30, 9751 NN Haren, The Netherlands. s.a.f.t.van.hijum@rug.nl

ABSTRACT

Background: Array-based comparative genome hybridization (aCGH) is commonly used to determine the genomic content of bacterial strains. Since prokaryotes in general have less conserved genome sequences than eukaryotes, sequence divergences between the genes in the genomes used for an aCGH experiment obstruct determination of genome variations (e.g. deletions). Current normalization methods do not take into consideration sequence divergence between target and microarray features and therefore cannot distinguish a difference in signal due to systematic errors in the data or due to sequence divergence.

Results: We present supervised Lowess, or S-Lowess, an application of the subset Lowess normalization method. By using a predicted subset of array features with minimal sequence divergence between the analyzed strains for the normalization procedure we remove systematic errors from dual-dye aCGH data in two steps: (1) determination of a subset of conserved genes (i.e. likely conserved genes, LCG); and (2) using the LCG for subset Lowess normalization. Subset Lowess determines the correction factors for systematic errors in the subset of array features and normalizes all array features using these correction factors. The performance of S-Lowess was assessed on aCGH experiments in which differentially labeled genomic DNA fragments of Lactococcus lactis IL1403 and L. lactis MG1363 strains were hybridized to IL1403 DNA microarrays. Since both genomes are sequenced and gene deletions identified, the success rate of different aCGH normalization methods in detecting these deletions in the MG1363 genome were determined. S-Lowess detects 97% of the deletions, whereas other aCGH normalization methods detect up to only 60% of the deletions.

Conclusion: S-Lowess is implemented in a user-friendly web-tool accessible from http://bioinformatics.biol.rug.nl/websoftware/s-lowess. We demonstrate that it outperforms existing normalization methods and maximizes detection of genomic variation (e.g. deletions) from microbial aCGH data.

Show MeSH

Related in: MedlinePlus

Performance of the different normalization methods in the identification of deletions in L. lactis MG1363. Blue: number of deletions correctly called (here a cutoff of 1.5 fold is used). Purple: number of deletions missed. S-L: S-Lowess. S-L Sp: S-Lowess normalization based on the LCG set obtained from the comparison of L. lactis IL1403 amplicon sequences to the ORFs of three S. pneumoniae strains. The total heights of the bars indicate the total number of amplicons for missing ORFs in L. lactis MG1363 with at least 5 aCGH measurements. They thus indicate the total number of missing ORFs that could be detected based on the aCGH data.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2275246&req=5

Figure 3: Performance of the different normalization methods in the identification of deletions in L. lactis MG1363. Blue: number of deletions correctly called (here a cutoff of 1.5 fold is used). Purple: number of deletions missed. S-L: S-Lowess. S-L Sp: S-Lowess normalization based on the LCG set obtained from the comparison of L. lactis IL1403 amplicon sequences to the ORFs of three S. pneumoniae strains. The total heights of the bars indicate the total number of amplicons for missing ORFs in L. lactis MG1363 with at least 5 aCGH measurements. They thus indicate the total number of missing ORFs that could be detected based on the aCGH data.

Mentions: In general, the S-Lowess normalization yielded a higher number of correctly detected deletions in the L. lactis MG1363 genome (see Fig. 3 and [17]) compared to Lowess or total signal normalization, and non-normalized data. Application of the GENCOM method [18] to the L. lactis MG1363 and L. lactis IL1403 aCGH dataset results in a poor performance: only 45 of the over 2122 genes are divergent and the differences between both strains reported by GENCOM are approximately evenly distributed [17].


Supervised Lowess normalization of comparative genome hybridization data--application to lactococcal strain comparisons.

van Hijum SA, Baerends RJ, Zomer AL, Karsens HA, Martin-Requena V, Trelles O, Kok J, Kuipers OP - BMC Bioinformatics (2008)

Performance of the different normalization methods in the identification of deletions in L. lactis MG1363. Blue: number of deletions correctly called (here a cutoff of 1.5 fold is used). Purple: number of deletions missed. S-L: S-Lowess. S-L Sp: S-Lowess normalization based on the LCG set obtained from the comparison of L. lactis IL1403 amplicon sequences to the ORFs of three S. pneumoniae strains. The total heights of the bars indicate the total number of amplicons for missing ORFs in L. lactis MG1363 with at least 5 aCGH measurements. They thus indicate the total number of missing ORFs that could be detected based on the aCGH data.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2275246&req=5

Figure 3: Performance of the different normalization methods in the identification of deletions in L. lactis MG1363. Blue: number of deletions correctly called (here a cutoff of 1.5 fold is used). Purple: number of deletions missed. S-L: S-Lowess. S-L Sp: S-Lowess normalization based on the LCG set obtained from the comparison of L. lactis IL1403 amplicon sequences to the ORFs of three S. pneumoniae strains. The total heights of the bars indicate the total number of amplicons for missing ORFs in L. lactis MG1363 with at least 5 aCGH measurements. They thus indicate the total number of missing ORFs that could be detected based on the aCGH data.
Mentions: In general, the S-Lowess normalization yielded a higher number of correctly detected deletions in the L. lactis MG1363 genome (see Fig. 3 and [17]) compared to Lowess or total signal normalization, and non-normalized data. Application of the GENCOM method [18] to the L. lactis MG1363 and L. lactis IL1403 aCGH dataset results in a poor performance: only 45 of the over 2122 genes are divergent and the differences between both strains reported by GENCOM are approximately evenly distributed [17].

Bottom Line: Current normalization methods do not take into consideration sequence divergence between target and microarray features and therefore cannot distinguish a difference in signal due to systematic errors in the data or due to sequence divergence.Since both genomes are sequenced and gene deletions identified, the success rate of different aCGH normalization methods in detecting these deletions in the MG1363 genome were determined.S-Lowess detects 97% of the deletions, whereas other aCGH normalization methods detect up to only 60% of the deletions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Molecular Genetics, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Kerklaan 30, 9751 NN Haren, The Netherlands. s.a.f.t.van.hijum@rug.nl

ABSTRACT

Background: Array-based comparative genome hybridization (aCGH) is commonly used to determine the genomic content of bacterial strains. Since prokaryotes in general have less conserved genome sequences than eukaryotes, sequence divergences between the genes in the genomes used for an aCGH experiment obstruct determination of genome variations (e.g. deletions). Current normalization methods do not take into consideration sequence divergence between target and microarray features and therefore cannot distinguish a difference in signal due to systematic errors in the data or due to sequence divergence.

Results: We present supervised Lowess, or S-Lowess, an application of the subset Lowess normalization method. By using a predicted subset of array features with minimal sequence divergence between the analyzed strains for the normalization procedure we remove systematic errors from dual-dye aCGH data in two steps: (1) determination of a subset of conserved genes (i.e. likely conserved genes, LCG); and (2) using the LCG for subset Lowess normalization. Subset Lowess determines the correction factors for systematic errors in the subset of array features and normalizes all array features using these correction factors. The performance of S-Lowess was assessed on aCGH experiments in which differentially labeled genomic DNA fragments of Lactococcus lactis IL1403 and L. lactis MG1363 strains were hybridized to IL1403 DNA microarrays. Since both genomes are sequenced and gene deletions identified, the success rate of different aCGH normalization methods in detecting these deletions in the MG1363 genome were determined. S-Lowess detects 97% of the deletions, whereas other aCGH normalization methods detect up to only 60% of the deletions.

Conclusion: S-Lowess is implemented in a user-friendly web-tool accessible from http://bioinformatics.biol.rug.nl/websoftware/s-lowess. We demonstrate that it outperforms existing normalization methods and maximizes detection of genomic variation (e.g. deletions) from microbial aCGH data.

Show MeSH
Related in: MedlinePlus