Limits...
Simulation of microarray data with realistic characteristics.

Nykter M, Aho T, Ahdesmäki M, Ruusuvuori P, Lehmussola A, Yli-Harja O - BMC Bioinformatics (2006)

Bottom Line: It includes several error models that have been proposed earlier and it can be used with different types of input data.The model can be used to simulate both spotted two-channel and oligonucleotide based single-channel microarrays.All this makes the model a valuable tool for example in validation of data analysis algorithms.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Signal Processing, Tampere University of Technology, Tampere, Finland. matti.nykter@tut.fi

ABSTRACT

Background: Microarray technologies have become common tools in biological research. As a result, a need for effective computational methods for data analysis has emerged. Numerous different algorithms have been proposed for analyzing the data. However, an objective evaluation of the proposed algorithms is not possible due to the lack of biological ground truth information. To overcome this fundamental problem, the use of simulated microarray data for algorithm validation has been proposed.

Results: We present a microarray simulation model which can be used to validate different kinds of data analysis algorithms. The proposed model is unique in the sense that it includes all the steps that affect the quality of real microarray data. These steps include the simulation of biological ground truth data, applying biological and measurement technology specific error models, and finally simulating the microarray slide manufacturing and hybridization. After all these steps are taken into account, the simulated data has realistic biological and statistical characteristics. The applicability of the proposed model is demonstrated by several examples.

Conclusion: The proposed microarray simulation model is modular and can be used in different kinds of applications. It includes several error models that have been proposed earlier and it can be used with different types of input data. The model can be used to simulate both spotted two-channel and oligonucleotide based single-channel microarrays. All this makes the model a valuable tool for example in validation of data analysis algorithms.

Show MeSH

Related in: MedlinePlus

Simulated ground truth signals. Gene expression profiles of the selected genes. The effect of the gene knockout to the expression profiles is clearly observable. Reference signal is shown with solid and test signal with dashed line.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1574357&req=5

Figure 6: Simulated ground truth signals. Gene expression profiles of the selected genes. The effect of the gene knockout to the expression profiles is clearly observable. Reference signal is shown with solid and test signal with dashed line.

Mentions: We first demonstrate the use of the proposed microarray model using simulated gene expression data. The ground truth biological signals are generated using random network topology with kinetic rate laws that present rates for transcription processes, and kinetic rate laws for degradation rates of the gene products [18]. The details about the data generation can be found on our companion web site. We use a gene knock out experiment as a case study [31]. The reference data is obtained by simulating the generated network. Then the test sample is obtained by knocking out a randomly chosen gene from the network and then running the same simulation using the network with the knocked out gene. Simulated gene expression profiles of a few selected genes that were affected by the knock out are shown in Figure 6. Next an error model is applied to the obtained ground truth data. We use the hierarchical error model to model the biological and the measurement specific noise [6]. Figure 7 illustrates the simulated gene expressions profiles after adding the noise.


Simulation of microarray data with realistic characteristics.

Nykter M, Aho T, Ahdesmäki M, Ruusuvuori P, Lehmussola A, Yli-Harja O - BMC Bioinformatics (2006)

Simulated ground truth signals. Gene expression profiles of the selected genes. The effect of the gene knockout to the expression profiles is clearly observable. Reference signal is shown with solid and test signal with dashed line.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1574357&req=5

Figure 6: Simulated ground truth signals. Gene expression profiles of the selected genes. The effect of the gene knockout to the expression profiles is clearly observable. Reference signal is shown with solid and test signal with dashed line.
Mentions: We first demonstrate the use of the proposed microarray model using simulated gene expression data. The ground truth biological signals are generated using random network topology with kinetic rate laws that present rates for transcription processes, and kinetic rate laws for degradation rates of the gene products [18]. The details about the data generation can be found on our companion web site. We use a gene knock out experiment as a case study [31]. The reference data is obtained by simulating the generated network. Then the test sample is obtained by knocking out a randomly chosen gene from the network and then running the same simulation using the network with the knocked out gene. Simulated gene expression profiles of a few selected genes that were affected by the knock out are shown in Figure 6. Next an error model is applied to the obtained ground truth data. We use the hierarchical error model to model the biological and the measurement specific noise [6]. Figure 7 illustrates the simulated gene expressions profiles after adding the noise.

Bottom Line: It includes several error models that have been proposed earlier and it can be used with different types of input data.The model can be used to simulate both spotted two-channel and oligonucleotide based single-channel microarrays.All this makes the model a valuable tool for example in validation of data analysis algorithms.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Signal Processing, Tampere University of Technology, Tampere, Finland. matti.nykter@tut.fi

ABSTRACT

Background: Microarray technologies have become common tools in biological research. As a result, a need for effective computational methods for data analysis has emerged. Numerous different algorithms have been proposed for analyzing the data. However, an objective evaluation of the proposed algorithms is not possible due to the lack of biological ground truth information. To overcome this fundamental problem, the use of simulated microarray data for algorithm validation has been proposed.

Results: We present a microarray simulation model which can be used to validate different kinds of data analysis algorithms. The proposed model is unique in the sense that it includes all the steps that affect the quality of real microarray data. These steps include the simulation of biological ground truth data, applying biological and measurement technology specific error models, and finally simulating the microarray slide manufacturing and hybridization. After all these steps are taken into account, the simulated data has realistic biological and statistical characteristics. The applicability of the proposed model is demonstrated by several examples.

Conclusion: The proposed microarray simulation model is modular and can be used in different kinds of applications. It includes several error models that have been proposed earlier and it can be used with different types of input data. The model can be used to simulate both spotted two-channel and oligonucleotide based single-channel microarrays. All this makes the model a valuable tool for example in validation of data analysis algorithms.

Show MeSH
Related in: MedlinePlus