Limits...
Simulation of microarray data with realistic characteristics.

Nykter M, Aho T, Ahdesmäki M, Ruusuvuori P, Lehmussola A, Yli-Harja O - BMC Bioinformatics (2006)

Bottom Line: It includes several error models that have been proposed earlier and it can be used with different types of input data.The model can be used to simulate both spotted two-channel and oligonucleotide based single-channel microarrays.All this makes the model a valuable tool for example in validation of data analysis algorithms.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Signal Processing, Tampere University of Technology, Tampere, Finland. matti.nykter@tut.fi

ABSTRACT

Background: Microarray technologies have become common tools in biological research. As a result, a need for effective computational methods for data analysis has emerged. Numerous different algorithms have been proposed for analyzing the data. However, an objective evaluation of the proposed algorithms is not possible due to the lack of biological ground truth information. To overcome this fundamental problem, the use of simulated microarray data for algorithm validation has been proposed.

Results: We present a microarray simulation model which can be used to validate different kinds of data analysis algorithms. The proposed model is unique in the sense that it includes all the steps that affect the quality of real microarray data. These steps include the simulation of biological ground truth data, applying biological and measurement technology specific error models, and finally simulating the microarray slide manufacturing and hybridization. After all these steps are taken into account, the simulated data has realistic biological and statistical characteristics. The applicability of the proposed model is demonstrated by several examples.

Conclusion: The proposed microarray simulation model is modular and can be used in different kinds of applications. It includes several error models that have been proposed earlier and it can be used with different types of input data. The model can be used to simulate both spotted two-channel and oligonucleotide based single-channel microarrays. All this makes the model a valuable tool for example in validation of data analysis algorithms.

Show MeSH

Related in: MedlinePlus

Slide image segmentation examples. One subarray from each of the images used to test the segmentation algorithms are shown. From left to right: (a) high quality slide, (b) noisy slide with artifacts, and (c) disturbing noise and artifacts over the slide. Increase in noise and degradation of the spot quality is clearly observable.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1574357&req=5

Figure 11: Slide image segmentation examples. One subarray from each of the images used to test the segmentation algorithms are shown. From left to right: (a) high quality slide, (b) noisy slide with artifacts, and (c) disturbing noise and artifacts over the slide. Increase in noise and degradation of the spot quality is clearly observable.

Mentions: We simulate three test images consisting of eight subarrays with altogether 1000 spots per image. Each image has different quality characteristics. The first image is of high quality, with low variance noise (0.01) and relatively round and regularly sized spots. The second image has more noise (variance 0.02) and more irregular spot shapes and sizes, while the third has even more disturbing noise which has higher variance (0.03). Furthermore the spot shapes and sizes include more variation compared to the other images. Air bubbles, scratches, spot bleeding, and print tip effects are added into the second and third image, the third including more such artefacts than the second image. Figure 11 shows one subarray from each of the images used in this experiment. Detailed information about the simulation parameters for these three images is available on the companion web page.


Simulation of microarray data with realistic characteristics.

Nykter M, Aho T, Ahdesmäki M, Ruusuvuori P, Lehmussola A, Yli-Harja O - BMC Bioinformatics (2006)

Slide image segmentation examples. One subarray from each of the images used to test the segmentation algorithms are shown. From left to right: (a) high quality slide, (b) noisy slide with artifacts, and (c) disturbing noise and artifacts over the slide. Increase in noise and degradation of the spot quality is clearly observable.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1574357&req=5

Figure 11: Slide image segmentation examples. One subarray from each of the images used to test the segmentation algorithms are shown. From left to right: (a) high quality slide, (b) noisy slide with artifacts, and (c) disturbing noise and artifacts over the slide. Increase in noise and degradation of the spot quality is clearly observable.
Mentions: We simulate three test images consisting of eight subarrays with altogether 1000 spots per image. Each image has different quality characteristics. The first image is of high quality, with low variance noise (0.01) and relatively round and regularly sized spots. The second image has more noise (variance 0.02) and more irregular spot shapes and sizes, while the third has even more disturbing noise which has higher variance (0.03). Furthermore the spot shapes and sizes include more variation compared to the other images. Air bubbles, scratches, spot bleeding, and print tip effects are added into the second and third image, the third including more such artefacts than the second image. Figure 11 shows one subarray from each of the images used in this experiment. Detailed information about the simulation parameters for these three images is available on the companion web page.

Bottom Line: It includes several error models that have been proposed earlier and it can be used with different types of input data.The model can be used to simulate both spotted two-channel and oligonucleotide based single-channel microarrays.All this makes the model a valuable tool for example in validation of data analysis algorithms.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Signal Processing, Tampere University of Technology, Tampere, Finland. matti.nykter@tut.fi

ABSTRACT

Background: Microarray technologies have become common tools in biological research. As a result, a need for effective computational methods for data analysis has emerged. Numerous different algorithms have been proposed for analyzing the data. However, an objective evaluation of the proposed algorithms is not possible due to the lack of biological ground truth information. To overcome this fundamental problem, the use of simulated microarray data for algorithm validation has been proposed.

Results: We present a microarray simulation model which can be used to validate different kinds of data analysis algorithms. The proposed model is unique in the sense that it includes all the steps that affect the quality of real microarray data. These steps include the simulation of biological ground truth data, applying biological and measurement technology specific error models, and finally simulating the microarray slide manufacturing and hybridization. After all these steps are taken into account, the simulated data has realistic biological and statistical characteristics. The applicability of the proposed model is demonstrated by several examples.

Conclusion: The proposed microarray simulation model is modular and can be used in different kinds of applications. It includes several error models that have been proposed earlier and it can be used with different types of input data. The model can be used to simulate both spotted two-channel and oligonucleotide based single-channel microarrays. All this makes the model a valuable tool for example in validation of data analysis algorithms.

Show MeSH
Related in: MedlinePlus