Limits...
A simple approach to ranking differentially expressed gene expression time courses through Gaussian process regression.

Kalaitzis AA, Lawrence ND - BMC Bioinformatics (2011)

Bottom Line: Gaussian processes offer an attractive trade-off between efficiency and usability for the analysis of microarray time series.The Gaussian process framework offers a natural way of handling biological replicates and missing values and provides confidence intervals along the estimated curves of gene expression.Therefore, we believe Gaussian processes should be a standard tool in the analysis of gene expression time series.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Sheffield Institute for Translational Neuroscience, 385A Glossop Road, Sheffield, S10 2HQ, UK. A.Kalaitzis@sheffield.ac.uk

ABSTRACT

Background: The analysis of gene expression from time series underpins many biological studies. Two basic forms of analysis recur for data of this type: removing inactive (quiet) genes from the study and determining which genes are differentially expressed. Often these analysis stages are applied disregarding the fact that the data is drawn from a time series. In this paper we propose a simple model for accounting for the underlying temporal nature of the data based on a Gaussian process.

Results: We review Gaussian process (GP) regression for estimating the continuous trajectories underlying in gene expression time-series. We present a simple approach which can be used to filter quiet genes, or for the case of time series in the form of expression ratios, quantify differential expression. We assess via ROC curves the rankings produced by our regression framework and compare them to a recently proposed hierarchical Bayesian model for the analysis of gene expression time-series (BATS). We compare on both simulated and experimental data showing that the proposed approach considerably outperforms the current state of the art.

Conclusions: Gaussian processes offer an attractive trade-off between efficiency and usability for the analysis of microarray time series. The Gaussian process framework offers a natural way of handling biological replicates and missing values and provides confidence intervals along the estimated curves of gene expression. Therefore, we believe Gaussian processes should be a standard tool in the analysis of gene expression time series.

Show MeSH

Related in: MedlinePlus

GP vs. BATS on experimental data. ROC curves for the GP and BATS methods on experimental data from [13]. As in Figure 2, one ROC curve and the area under it (AUC) are depicted for the GP method and three for BATS, each using a different noise model indicated by the subscript in the legend. (a) Ground truth consists of 22690 labels among which only the 786 profiles chosen to be ranked by TSNI (based on the area under their curves) are labeled as "1", cf. Experimental data. (b) Same number of labels; here only the top 100 profiles ranked by TSNI are labeled as "1".
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3116489&req=5

Figure 4: GP vs. BATS on experimental data. ROC curves for the GP and BATS methods on experimental data from [13]. As in Figure 2, one ROC curve and the area under it (AUC) are depicted for the GP method and three for BATS, each using a different noise model indicated by the subscript in the legend. (a) Ground truth consists of 22690 labels among which only the 786 profiles chosen to be ranked by TSNI (based on the area under their curves) are labeled as "1", cf. Experimental data. (b) Same number of labels; here only the top 100 profiles ranked by TSNI are labeled as "1".

Mentions: We label the top 100 position of the TSNI ranking as "1" in the ground truth as they are the most likely to be direct targets of the TRP63 transcription factor and because the binding scores (computed as the sum of -log2 of p-values of all TRP63-binding regions identified by ChIP-chip experiments) are most densely distributed amongst the first 100 positions, see Figure 3. Furthermore, in [13] these 100 positions were further validated by gene set enrichment analysis (GSEA) [31] to check if their up/down regulation patterns were correlated to genes that respond to TRP63 knock-downs in general. In summary, "the top 100 TSNI ranked transcripts are significantly enriched for the strongest binding sites" [13]. Figure 4 illustrates the comparison on the experimental data.


A simple approach to ranking differentially expressed gene expression time courses through Gaussian process regression.

Kalaitzis AA, Lawrence ND - BMC Bioinformatics (2011)

GP vs. BATS on experimental data. ROC curves for the GP and BATS methods on experimental data from [13]. As in Figure 2, one ROC curve and the area under it (AUC) are depicted for the GP method and three for BATS, each using a different noise model indicated by the subscript in the legend. (a) Ground truth consists of 22690 labels among which only the 786 profiles chosen to be ranked by TSNI (based on the area under their curves) are labeled as "1", cf. Experimental data. (b) Same number of labels; here only the top 100 profiles ranked by TSNI are labeled as "1".
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3116489&req=5

Figure 4: GP vs. BATS on experimental data. ROC curves for the GP and BATS methods on experimental data from [13]. As in Figure 2, one ROC curve and the area under it (AUC) are depicted for the GP method and three for BATS, each using a different noise model indicated by the subscript in the legend. (a) Ground truth consists of 22690 labels among which only the 786 profiles chosen to be ranked by TSNI (based on the area under their curves) are labeled as "1", cf. Experimental data. (b) Same number of labels; here only the top 100 profiles ranked by TSNI are labeled as "1".
Mentions: We label the top 100 position of the TSNI ranking as "1" in the ground truth as they are the most likely to be direct targets of the TRP63 transcription factor and because the binding scores (computed as the sum of -log2 of p-values of all TRP63-binding regions identified by ChIP-chip experiments) are most densely distributed amongst the first 100 positions, see Figure 3. Furthermore, in [13] these 100 positions were further validated by gene set enrichment analysis (GSEA) [31] to check if their up/down regulation patterns were correlated to genes that respond to TRP63 knock-downs in general. In summary, "the top 100 TSNI ranked transcripts are significantly enriched for the strongest binding sites" [13]. Figure 4 illustrates the comparison on the experimental data.

Bottom Line: Gaussian processes offer an attractive trade-off between efficiency and usability for the analysis of microarray time series.The Gaussian process framework offers a natural way of handling biological replicates and missing values and provides confidence intervals along the estimated curves of gene expression.Therefore, we believe Gaussian processes should be a standard tool in the analysis of gene expression time series.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Sheffield Institute for Translational Neuroscience, 385A Glossop Road, Sheffield, S10 2HQ, UK. A.Kalaitzis@sheffield.ac.uk

ABSTRACT

Background: The analysis of gene expression from time series underpins many biological studies. Two basic forms of analysis recur for data of this type: removing inactive (quiet) genes from the study and determining which genes are differentially expressed. Often these analysis stages are applied disregarding the fact that the data is drawn from a time series. In this paper we propose a simple model for accounting for the underlying temporal nature of the data based on a Gaussian process.

Results: We review Gaussian process (GP) regression for estimating the continuous trajectories underlying in gene expression time-series. We present a simple approach which can be used to filter quiet genes, or for the case of time series in the form of expression ratios, quantify differential expression. We assess via ROC curves the rankings produced by our regression framework and compare them to a recently proposed hierarchical Bayesian model for the analysis of gene expression time-series (BATS). We compare on both simulated and experimental data showing that the proposed approach considerably outperforms the current state of the art.

Conclusions: Gaussian processes offer an attractive trade-off between efficiency and usability for the analysis of microarray time series. The Gaussian process framework offers a natural way of handling biological replicates and missing values and provides confidence intervals along the estimated curves of gene expression. Therefore, we believe Gaussian processes should be a standard tool in the analysis of gene expression time series.

Show MeSH
Related in: MedlinePlus