Limits...
tigaR: integrative significance analysis of temporal differential gene expression induced by genomic abnormalities.

Miok V, Wilting SM, van de Wiel MA, Jaspers A, van Noort PI, Brakenhoff RH, Snijders PJ, Steenbergen RD, van Wieringen WN - BMC Bioinformatics (2014)

Bottom Line: In addition, to make estimates of the DNA copy number more stable, model parameters are also estimated in a multivariate way using triplets of features, imposing a spatial prior for the copy number effect.With the proposed method for analysis of time-course multilevel molecular data, more profound insight may be gained through the identification of temporal differential expression induced by DNA copy number abnormalities.Furthermore, the proposed method yields improvements in sensitivity, specificity and reproducibility compared to existing methods.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Biostatistics, VU University Medical Center, P,O, Box 7057, 1007 MB, Amsterdam, The Netherlands. w.vanwieringen@vumc.nl.

ABSTRACT

Background: To determine which changes in the host cell genome are crucial for cervical carcinogenesis, a longitudinal in vitro model system of HPV-transformed keratinocytes was profiled in a genome-wide manner. Four cell lines affected with either HPV16 or HPV18 were assayed at 8 sequential time points for gene expression (mRNA) and gene copy number (DNA) using high-resolution microarrays. Available methods for temporal differential expression analysis are not designed for integrative genomic studies.

Results: Here, we present a method that allows for the identification of differential gene expression associated with DNA copy number changes over time. The temporal variation in gene expression is described by a generalized linear mixed model employing low-rank thin-plate splines. Model parameters are estimated with an empirical Bayes procedure, which exploits integrated nested Laplace approximation for fast computation. Iteratively, posteriors of hyperparameters and model parameters are estimated. The empirical Bayes procedure shrinks multiple dispersion-related parameters. Shrinkage leads to more stable estimates of the model parameters, better control of false positives and improvement of reproducibility. In addition, to make estimates of the DNA copy number more stable, model parameters are also estimated in a multivariate way using triplets of features, imposing a spatial prior for the copy number effect.

Conclusion: With the proposed method for analysis of time-course multilevel molecular data, more profound insight may be gained through the identification of temporal differential expression induced by DNA copy number abnormalities. In particular, in the analysis of an integrative oncogenomics study with a time-course set-up our method finds genes previously reported to be involved in cervical carcinogenesis. Furthermore, the proposed method yields improvements in sensitivity, specificity and reproducibility compared to existing methods. Finally, the proposed method is able to handle count (RNAseq) data from time course experiments as is shown on a real data set.

Show MeSH

Related in: MedlinePlus

Histograms of DNA copy number parameters. The left panel represents DNA copy number parameters estimated using standard spline design matrix, while the right panel indicates parameters estimated employing an orthogonalized spline design matrix.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4288633&req=5

Fig2: Histograms of DNA copy number parameters. The left panel represents DNA copy number parameters estimated using standard spline design matrix, while the right panel indicates parameters estimated employing an orthogonalized spline design matrix.

Mentions: Turning to the effect of DNA copy number, we first analyzed the data with Model (2) containing only the fixed cell line and DNA copy number effect. This analysis identified 568 features with a significant gene dosage effect on expression. Inclusion of the time effect in Model (2) reduces the number of features with a significant gene dosage effect on expression (third row of Table 1). This is due to the fact that the spline competes with DNA copy number to explain the variation in expression levels: the former (being more flexible) captures variation caused by the latter. That aside, here too we see that the improved fit (now due to the increased flexibility of a different spline per cell line) yields a surge in the number of findings. The orthogonalization of the spline basis identify only one additional feature (using common splines) compared to the standard analysis. The effect of orthogonalization is more visible in the gene dosage effect βj, as can be witnessed from Figure 2. Clearly, orthogonalization moves the distribution of the βj’s to the right (the positive domain, which corroborates with the biologically expected direction of the effect). Finally, on the full data set (not shown) the analysis using the orthogonalized spline basis gives a modest improvement in the number of genes significantly affected by DNA copy number.Figure 2


tigaR: integrative significance analysis of temporal differential gene expression induced by genomic abnormalities.

Miok V, Wilting SM, van de Wiel MA, Jaspers A, van Noort PI, Brakenhoff RH, Snijders PJ, Steenbergen RD, van Wieringen WN - BMC Bioinformatics (2014)

Histograms of DNA copy number parameters. The left panel represents DNA copy number parameters estimated using standard spline design matrix, while the right panel indicates parameters estimated employing an orthogonalized spline design matrix.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4288633&req=5

Fig2: Histograms of DNA copy number parameters. The left panel represents DNA copy number parameters estimated using standard spline design matrix, while the right panel indicates parameters estimated employing an orthogonalized spline design matrix.
Mentions: Turning to the effect of DNA copy number, we first analyzed the data with Model (2) containing only the fixed cell line and DNA copy number effect. This analysis identified 568 features with a significant gene dosage effect on expression. Inclusion of the time effect in Model (2) reduces the number of features with a significant gene dosage effect on expression (third row of Table 1). This is due to the fact that the spline competes with DNA copy number to explain the variation in expression levels: the former (being more flexible) captures variation caused by the latter. That aside, here too we see that the improved fit (now due to the increased flexibility of a different spline per cell line) yields a surge in the number of findings. The orthogonalization of the spline basis identify only one additional feature (using common splines) compared to the standard analysis. The effect of orthogonalization is more visible in the gene dosage effect βj, as can be witnessed from Figure 2. Clearly, orthogonalization moves the distribution of the βj’s to the right (the positive domain, which corroborates with the biologically expected direction of the effect). Finally, on the full data set (not shown) the analysis using the orthogonalized spline basis gives a modest improvement in the number of genes significantly affected by DNA copy number.Figure 2

Bottom Line: In addition, to make estimates of the DNA copy number more stable, model parameters are also estimated in a multivariate way using triplets of features, imposing a spatial prior for the copy number effect.With the proposed method for analysis of time-course multilevel molecular data, more profound insight may be gained through the identification of temporal differential expression induced by DNA copy number abnormalities.Furthermore, the proposed method yields improvements in sensitivity, specificity and reproducibility compared to existing methods.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Biostatistics, VU University Medical Center, P,O, Box 7057, 1007 MB, Amsterdam, The Netherlands. w.vanwieringen@vumc.nl.

ABSTRACT

Background: To determine which changes in the host cell genome are crucial for cervical carcinogenesis, a longitudinal in vitro model system of HPV-transformed keratinocytes was profiled in a genome-wide manner. Four cell lines affected with either HPV16 or HPV18 were assayed at 8 sequential time points for gene expression (mRNA) and gene copy number (DNA) using high-resolution microarrays. Available methods for temporal differential expression analysis are not designed for integrative genomic studies.

Results: Here, we present a method that allows for the identification of differential gene expression associated with DNA copy number changes over time. The temporal variation in gene expression is described by a generalized linear mixed model employing low-rank thin-plate splines. Model parameters are estimated with an empirical Bayes procedure, which exploits integrated nested Laplace approximation for fast computation. Iteratively, posteriors of hyperparameters and model parameters are estimated. The empirical Bayes procedure shrinks multiple dispersion-related parameters. Shrinkage leads to more stable estimates of the model parameters, better control of false positives and improvement of reproducibility. In addition, to make estimates of the DNA copy number more stable, model parameters are also estimated in a multivariate way using triplets of features, imposing a spatial prior for the copy number effect.

Conclusion: With the proposed method for analysis of time-course multilevel molecular data, more profound insight may be gained through the identification of temporal differential expression induced by DNA copy number abnormalities. In particular, in the analysis of an integrative oncogenomics study with a time-course set-up our method finds genes previously reported to be involved in cervical carcinogenesis. Furthermore, the proposed method yields improvements in sensitivity, specificity and reproducibility compared to existing methods. Finally, the proposed method is able to handle count (RNAseq) data from time course experiments as is shown on a real data set.

Show MeSH
Related in: MedlinePlus