Limits...
Simulation of microarray data with realistic characteristics.

Nykter M, Aho T, Ahdesmäki M, Ruusuvuori P, Lehmussola A, Yli-Harja O - BMC Bioinformatics (2006)

Bottom Line: It includes several error models that have been proposed earlier and it can be used with different types of input data.The model can be used to simulate both spotted two-channel and oligonucleotide based single-channel microarrays.All this makes the model a valuable tool for example in validation of data analysis algorithms.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Signal Processing, Tampere University of Technology, Tampere, Finland. matti.nykter@tut.fi

ABSTRACT

Background: Microarray technologies have become common tools in biological research. As a result, a need for effective computational methods for data analysis has emerged. Numerous different algorithms have been proposed for analyzing the data. However, an objective evaluation of the proposed algorithms is not possible due to the lack of biological ground truth information. To overcome this fundamental problem, the use of simulated microarray data for algorithm validation has been proposed.

Results: We present a microarray simulation model which can be used to validate different kinds of data analysis algorithms. The proposed model is unique in the sense that it includes all the steps that affect the quality of real microarray data. These steps include the simulation of biological ground truth data, applying biological and measurement technology specific error models, and finally simulating the microarray slide manufacturing and hybridization. After all these steps are taken into account, the simulated data has realistic biological and statistical characteristics. The applicability of the proposed model is demonstrated by several examples.

Conclusion: The proposed microarray simulation model is modular and can be used in different kinds of applications. It includes several error models that have been proposed earlier and it can be used with different types of input data. The model can be used to simulate both spotted two-channel and oligonucleotide based single-channel microarrays. All this makes the model a valuable tool for example in validation of data analysis algorithms.

Show MeSH

Related in: MedlinePlus

Results of segmentation example. The spot intensities estimated from the simulated images with the fixed circle (first row), the histogram segmentation (second row), and the seeded region growing (third row) segmentation algorithms are plotted against the input data (reference). The plots are from the first channel of the test images: (a) intensities for the high quality image given by the three segmentation algorithms, (b) intensity plots for image with noise and errors, (c) plots for image with disturbing noise and artefacts.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1574357&req=5

Figure 12: Results of segmentation example. The spot intensities estimated from the simulated images with the fixed circle (first row), the histogram segmentation (second row), and the seeded region growing (third row) segmentation algorithms are plotted against the input data (reference). The plots are from the first channel of the test images: (a) intensities for the high quality image given by the three segmentation algorithms, (b) intensity plots for image with noise and errors, (c) plots for image with disturbing noise and artefacts.

Mentions: The results of applying the selected segmentation algorithms on the synthetic test images and calculating the spot intensities from the segmentation results are shown in Figure 12 where the estimated spot intensities are plotted against the reference signal. Figure 12(a) shows the scatter plots for the first image, 12(b) shows the plots for the second image of slightly degraded quality, and 12(c) presents the plots for the third, low quality image. After removing the estimated background, some of the spot intensities become negative. These negative intensities are replaced with zeros. To quantify the performance of different algorithms, we compute the correlation coefficient for each comparison. The results are given in Table 7. Even though we mainly focus on simulating images with realistic parameters, some observations on the segmentation results are presented. The results presented in Table 7 support intuition; all methods give worse results as the image quality is degraded. The fixed circle segmentation is likely to be confused by the irregular shapes and sizes of the spots in the second image (shown in Figure 11(b)) and especially in the third image (shown in Figure 11(c)). The other methods are corrupted mainly by the noise in the second and the third image. Despite the high correlation with the reference expressions, the intensity given by HST segmentation method suffers from a relatively high bias. However, the low scattering of intensities given by HST, compared to that of FC and SRG, explains the high correlation. HST has also less outliers on the lower side of the scatter plot. Both the bias in HST and scattering in FC and SRG are clearly visible in Figure 12. The results of the segmentation experiment are well in accordance with the basic assumptions. Thus, the images produced by the proposed simulation model can be used for testing microarray image processing algorithms, and the model provides useful information about the available methods.


Simulation of microarray data with realistic characteristics.

Nykter M, Aho T, Ahdesmäki M, Ruusuvuori P, Lehmussola A, Yli-Harja O - BMC Bioinformatics (2006)

Results of segmentation example. The spot intensities estimated from the simulated images with the fixed circle (first row), the histogram segmentation (second row), and the seeded region growing (third row) segmentation algorithms are plotted against the input data (reference). The plots are from the first channel of the test images: (a) intensities for the high quality image given by the three segmentation algorithms, (b) intensity plots for image with noise and errors, (c) plots for image with disturbing noise and artefacts.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1574357&req=5

Figure 12: Results of segmentation example. The spot intensities estimated from the simulated images with the fixed circle (first row), the histogram segmentation (second row), and the seeded region growing (third row) segmentation algorithms are plotted against the input data (reference). The plots are from the first channel of the test images: (a) intensities for the high quality image given by the three segmentation algorithms, (b) intensity plots for image with noise and errors, (c) plots for image with disturbing noise and artefacts.
Mentions: The results of applying the selected segmentation algorithms on the synthetic test images and calculating the spot intensities from the segmentation results are shown in Figure 12 where the estimated spot intensities are plotted against the reference signal. Figure 12(a) shows the scatter plots for the first image, 12(b) shows the plots for the second image of slightly degraded quality, and 12(c) presents the plots for the third, low quality image. After removing the estimated background, some of the spot intensities become negative. These negative intensities are replaced with zeros. To quantify the performance of different algorithms, we compute the correlation coefficient for each comparison. The results are given in Table 7. Even though we mainly focus on simulating images with realistic parameters, some observations on the segmentation results are presented. The results presented in Table 7 support intuition; all methods give worse results as the image quality is degraded. The fixed circle segmentation is likely to be confused by the irregular shapes and sizes of the spots in the second image (shown in Figure 11(b)) and especially in the third image (shown in Figure 11(c)). The other methods are corrupted mainly by the noise in the second and the third image. Despite the high correlation with the reference expressions, the intensity given by HST segmentation method suffers from a relatively high bias. However, the low scattering of intensities given by HST, compared to that of FC and SRG, explains the high correlation. HST has also less outliers on the lower side of the scatter plot. Both the bias in HST and scattering in FC and SRG are clearly visible in Figure 12. The results of the segmentation experiment are well in accordance with the basic assumptions. Thus, the images produced by the proposed simulation model can be used for testing microarray image processing algorithms, and the model provides useful information about the available methods.

Bottom Line: It includes several error models that have been proposed earlier and it can be used with different types of input data.The model can be used to simulate both spotted two-channel and oligonucleotide based single-channel microarrays.All this makes the model a valuable tool for example in validation of data analysis algorithms.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Signal Processing, Tampere University of Technology, Tampere, Finland. matti.nykter@tut.fi

ABSTRACT

Background: Microarray technologies have become common tools in biological research. As a result, a need for effective computational methods for data analysis has emerged. Numerous different algorithms have been proposed for analyzing the data. However, an objective evaluation of the proposed algorithms is not possible due to the lack of biological ground truth information. To overcome this fundamental problem, the use of simulated microarray data for algorithm validation has been proposed.

Results: We present a microarray simulation model which can be used to validate different kinds of data analysis algorithms. The proposed model is unique in the sense that it includes all the steps that affect the quality of real microarray data. These steps include the simulation of biological ground truth data, applying biological and measurement technology specific error models, and finally simulating the microarray slide manufacturing and hybridization. After all these steps are taken into account, the simulated data has realistic biological and statistical characteristics. The applicability of the proposed model is demonstrated by several examples.

Conclusion: The proposed microarray simulation model is modular and can be used in different kinds of applications. It includes several error models that have been proposed earlier and it can be used with different types of input data. The model can be used to simulate both spotted two-channel and oligonucleotide based single-channel microarrays. All this makes the model a valuable tool for example in validation of data analysis algorithms.

Show MeSH
Related in: MedlinePlus