Limits...
Custom design and analysis of high-density oligonucleotide bacterial tiling microarrays.

Thomassen GO, Rowe AD, Lagesen K, Lindvall JM, Rognes T - PLoS ONE (2009)

Bottom Line: The two major computational challenges associated with custom-made arrays are design and analysis.Tiling-arrays are becoming increasingly applicable in genomic research, but researchers still lack both the tools for custom design of arrays, as well as the systems and procedures for analysis of the vast amount of data resulting from such experiments.We believe that the methods described herein will be a useful contribution and resource for researchers designing and analysing custom tiling arrays for both bacteria and higher organisms.

View Article: PubMed Central - PubMed

Affiliation: Centre for Molecular Biology and Neuroscience, Institute of Medical Microbiology, University of Oslo, Oslo, Norway.

ABSTRACT

Background: High-density tiling microarrays are a powerful tool for the characterization of complete genomes. The two major computational challenges associated with custom-made arrays are design and analysis. Firstly, several genome dependent variables, such as the genome's complexity and sequence composition, need to be considered in the design to ensure a high quality microarray. Secondly, since tiling projects today very often exceed the limits of conventional array-experiments, researchers cannot use established computer tools designed for commercial arrays, and instead have to redesign previous methods or create novel tools.

Principal findings: Here we describe the multiple aspects involved in the design of tiling arrays for transcriptome analysis and detail the normalisation and analysis procedures for such microarrays. We introduce a novel design method to make two 280,000 feature microarrays covering the entire genome of the bacterial species Escherichia coli and Neisseria meningitidis, respectively, as well as the use of multiple copies of control probe-sets on tiling microarrays. Furthermore, a novel normalisation and background estimation procedure for tiling arrays is presented along with a method for array analysis focused on detection of short transcripts. The design, normalisation and analysis methods have been applied in various experiments and several of the detected novel short transcripts have been biologically confirmed by Northern blot tests.

Conclusions: Tiling-arrays are becoming increasingly applicable in genomic research, but researchers still lack both the tools for custom design of arrays, as well as the systems and procedures for analysis of the vast amount of data resulting from such experiments. We believe that the methods described herein will be a useful contribution and resource for researchers designing and analysing custom tiling arrays for both bacteria and higher organisms.

Show MeSH

Related in: MedlinePlus

Standard deviation versus intensity for all probe sets.Plotting standard deviation versus intensity for all probes across the 5 arrays (red circles) allowed a mean level of interest to be calculated for the standard deviation. This was considered as a global measure of the standard deviation (σg) between probes in the set of 5 arrays. All extreme outliers were removed (see text for details) and the result from this filtering is shown by blue circles.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2691959&req=5

pone-0005943-g002: Standard deviation versus intensity for all probe sets.Plotting standard deviation versus intensity for all probes across the 5 arrays (red circles) allowed a mean level of interest to be calculated for the standard deviation. This was considered as a global measure of the standard deviation (σg) between probes in the set of 5 arrays. All extreme outliers were removed (see text for details) and the result from this filtering is shown by blue circles.

Mentions: There are a number of accepted normalization techniques that can be applied to microarray data, with varying levels of complexity and transparency. In many experiments, normalisation procedures have proved extremely advantageous; but, as discussed elsewhere [34], in the cases of relatively small genomes such as that of E. coli (∼4.6 Mbp) and N. meningitidis (∼2.3 Mbp) the benefits are usually minimal and the application of complex sequence based normalisation routines can in fact confound otherwise clean data (See File S1 for full discussion). It follows, therefore, that it is preferable to minimise normalisation solely to the removal of significant outliers from the data. Ideally, data from multiple arrays show a variance between the log2 intensities of a single probe-set, which is independent of the mean log2 intensity for the given probes for all but the extremes of the data. Plotting the standard deviation versus the intensity for all probe-sets after aligning the data by the mean values of all chips (red circles in Figure 2) allowed a mean level to be calculated for the standard deviation. This was considered as a global measure of the standard deviation (σg) between probes in the set of 5 chips (see Figure 2). The global standard deviation was then used to process the data set, by removing the worst-case outliers from the data sets. Here, exactly 46,321 out of 2,733,980 data points were removed from the MNNG experiment. Outlier detection was performed by sorting the five different array signal values from each probe into ascending order and taking the mean of the middle three points as the central value. If either of the remaining probes was found to be more than three global standard deviations (3σg) from the central mean value it was considered to be an outlier with >99% certainty and was therefore discarded. In all other cases, the probe values were retained. The result of this probe outlier filtering is shown as blue circles (Figure 2). This was done before a comparison of relative expression levels was performed on the data.


Custom design and analysis of high-density oligonucleotide bacterial tiling microarrays.

Thomassen GO, Rowe AD, Lagesen K, Lindvall JM, Rognes T - PLoS ONE (2009)

Standard deviation versus intensity for all probe sets.Plotting standard deviation versus intensity for all probes across the 5 arrays (red circles) allowed a mean level of interest to be calculated for the standard deviation. This was considered as a global measure of the standard deviation (σg) between probes in the set of 5 arrays. All extreme outliers were removed (see text for details) and the result from this filtering is shown by blue circles.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2691959&req=5

pone-0005943-g002: Standard deviation versus intensity for all probe sets.Plotting standard deviation versus intensity for all probes across the 5 arrays (red circles) allowed a mean level of interest to be calculated for the standard deviation. This was considered as a global measure of the standard deviation (σg) between probes in the set of 5 arrays. All extreme outliers were removed (see text for details) and the result from this filtering is shown by blue circles.
Mentions: There are a number of accepted normalization techniques that can be applied to microarray data, with varying levels of complexity and transparency. In many experiments, normalisation procedures have proved extremely advantageous; but, as discussed elsewhere [34], in the cases of relatively small genomes such as that of E. coli (∼4.6 Mbp) and N. meningitidis (∼2.3 Mbp) the benefits are usually minimal and the application of complex sequence based normalisation routines can in fact confound otherwise clean data (See File S1 for full discussion). It follows, therefore, that it is preferable to minimise normalisation solely to the removal of significant outliers from the data. Ideally, data from multiple arrays show a variance between the log2 intensities of a single probe-set, which is independent of the mean log2 intensity for the given probes for all but the extremes of the data. Plotting the standard deviation versus the intensity for all probe-sets after aligning the data by the mean values of all chips (red circles in Figure 2) allowed a mean level to be calculated for the standard deviation. This was considered as a global measure of the standard deviation (σg) between probes in the set of 5 chips (see Figure 2). The global standard deviation was then used to process the data set, by removing the worst-case outliers from the data sets. Here, exactly 46,321 out of 2,733,980 data points were removed from the MNNG experiment. Outlier detection was performed by sorting the five different array signal values from each probe into ascending order and taking the mean of the middle three points as the central value. If either of the remaining probes was found to be more than three global standard deviations (3σg) from the central mean value it was considered to be an outlier with >99% certainty and was therefore discarded. In all other cases, the probe values were retained. The result of this probe outlier filtering is shown as blue circles (Figure 2). This was done before a comparison of relative expression levels was performed on the data.

Bottom Line: The two major computational challenges associated with custom-made arrays are design and analysis.Tiling-arrays are becoming increasingly applicable in genomic research, but researchers still lack both the tools for custom design of arrays, as well as the systems and procedures for analysis of the vast amount of data resulting from such experiments.We believe that the methods described herein will be a useful contribution and resource for researchers designing and analysing custom tiling arrays for both bacteria and higher organisms.

View Article: PubMed Central - PubMed

Affiliation: Centre for Molecular Biology and Neuroscience, Institute of Medical Microbiology, University of Oslo, Oslo, Norway.

ABSTRACT

Background: High-density tiling microarrays are a powerful tool for the characterization of complete genomes. The two major computational challenges associated with custom-made arrays are design and analysis. Firstly, several genome dependent variables, such as the genome's complexity and sequence composition, need to be considered in the design to ensure a high quality microarray. Secondly, since tiling projects today very often exceed the limits of conventional array-experiments, researchers cannot use established computer tools designed for commercial arrays, and instead have to redesign previous methods or create novel tools.

Principal findings: Here we describe the multiple aspects involved in the design of tiling arrays for transcriptome analysis and detail the normalisation and analysis procedures for such microarrays. We introduce a novel design method to make two 280,000 feature microarrays covering the entire genome of the bacterial species Escherichia coli and Neisseria meningitidis, respectively, as well as the use of multiple copies of control probe-sets on tiling microarrays. Furthermore, a novel normalisation and background estimation procedure for tiling arrays is presented along with a method for array analysis focused on detection of short transcripts. The design, normalisation and analysis methods have been applied in various experiments and several of the detected novel short transcripts have been biologically confirmed by Northern blot tests.

Conclusions: Tiling-arrays are becoming increasingly applicable in genomic research, but researchers still lack both the tools for custom design of arrays, as well as the systems and procedures for analysis of the vast amount of data resulting from such experiments. We believe that the methods described herein will be a useful contribution and resource for researchers designing and analysing custom tiling arrays for both bacteria and higher organisms.

Show MeSH
Related in: MedlinePlus