Limits...
An integrative clustering and modeling algorithm for dynamical gene expression data.

Sivriver J, Habib N, Friedman N - Bioinformatics (2011)

Bottom Line: Moreover, our approach provides an easy way to compare between responses to different stimuli at the dynamical level.We use our approach to analyze the dynamical transcriptional responses to inflammation and anti-viral stimuli in mice primary dendritic cells, and extract a concise representation of the different dynamical response types.We analyze the similarities and differences between the two stimuli and identify potential regulators of this complex transcriptional response.

View Article: PubMed Central - PubMed

Affiliation: School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 91904, Israel.

ABSTRACT

Motivation: The precise dynamics of gene expression is often crucial for proper response to stimuli. Time-course gene-expression profiles can provide insights about the dynamics of many cellular responses, but are often noisy and measured at arbitrary intervals, posing a major analysis challenge.

Results: We developed an algorithm that interleaves clustering time-course gene-expression data with estimation of dynamic models of their response by biologically meaningful parameters. In combining these two tasks we overcome obstacles posed in each one. Moreover, our approach provides an easy way to compare between responses to different stimuli at the dynamical level. We use our approach to analyze the dynamical transcriptional responses to inflammation and anti-viral stimuli in mice primary dendritic cells, and extract a concise representation of the different dynamical response types. We analyze the similarities and differences between the two stimuli and identify potential regulators of this complex transcriptional response.

Availability: The code to our method is freely available http://www.compbio.cs.huji.ac.il/DynaMiteC.

Contact: nir@cs.huji.ac.il.

Show MeSH
Evaluation on Synthetic Data. (A) Illustration of our three training datasets and our test set. Each horizontal line shows the time points in which the data was sampled, in each dataset. The purple vertical lines show the time points used as test data. (B) Median-squared fit error for test values across increasing sampling noise levels, as measured on the three different datasets: 10 time points dataset (basic), non-uniformly sampled dataset (early) and six time points dataset (six time points). Comparing DynaMiteC (DYNA) to our impulse model with no priors (no priors), and to the model of Chechik et al. (impulse). (C) As in (B), but showing the median squared error in the predicted parameters h1 (left) and t1 (right) compared to the true parameters, both on the basic dataset. (D) mutual information between the predicted and true clustering labels per gene. The datasets (from left to right): 10 time points dataset (basic), non-uniformly sampled dataset (early) and six time points dataset (six time points), all three with with increasing levels of parameter noise (variation in the clusters) and constant sampling noise. Compared methods are DynaMiteC (DYNA), K-means with Euclidean distance (Euclidian), and K-means with Pearson correlation (Correlation). (E) As in (D), but on the basic dataset with increasing levels of sampling noise and constant parameter noise.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3117368&req=5

Figure 2: Evaluation on Synthetic Data. (A) Illustration of our three training datasets and our test set. Each horizontal line shows the time points in which the data was sampled, in each dataset. The purple vertical lines show the time points used as test data. (B) Median-squared fit error for test values across increasing sampling noise levels, as measured on the three different datasets: 10 time points dataset (basic), non-uniformly sampled dataset (early) and six time points dataset (six time points). Comparing DynaMiteC (DYNA) to our impulse model with no priors (no priors), and to the model of Chechik et al. (impulse). (C) As in (B), but showing the median squared error in the predicted parameters h1 (left) and t1 (right) compared to the true parameters, both on the basic dataset. (D) mutual information between the predicted and true clustering labels per gene. The datasets (from left to right): 10 time points dataset (basic), non-uniformly sampled dataset (early) and six time points dataset (six time points), all three with with increasing levels of parameter noise (variation in the clusters) and constant sampling noise. Compared methods are DynaMiteC (DYNA), K-means with Euclidean distance (Euclidian), and K-means with Pearson correlation (Correlation). (E) As in (D), but on the basic dataset with increasing levels of sampling noise and constant parameter noise.

Mentions: Starting from cluster prototypes we generated data using two types of variation. First, the amount of variation within each cluster: how much the parameters of each gene deviate from the prototype it was generated from (using multiplicative Gaussian noise). Second, the noise in the data: how much the observed log-ratio expression value differs from the actual value for the gene (using additive Gaussian noise). From each prototype we created a set of 110–150 genes sampled across different time series (Fig. 2a), for each such gene both the model parameters and its cluster (according to the original prototype) are known. In this setting, we can test both our modeling method (prediction on new time points) and clustering method.Fig. 2.


An integrative clustering and modeling algorithm for dynamical gene expression data.

Sivriver J, Habib N, Friedman N - Bioinformatics (2011)

Evaluation on Synthetic Data. (A) Illustration of our three training datasets and our test set. Each horizontal line shows the time points in which the data was sampled, in each dataset. The purple vertical lines show the time points used as test data. (B) Median-squared fit error for test values across increasing sampling noise levels, as measured on the three different datasets: 10 time points dataset (basic), non-uniformly sampled dataset (early) and six time points dataset (six time points). Comparing DynaMiteC (DYNA) to our impulse model with no priors (no priors), and to the model of Chechik et al. (impulse). (C) As in (B), but showing the median squared error in the predicted parameters h1 (left) and t1 (right) compared to the true parameters, both on the basic dataset. (D) mutual information between the predicted and true clustering labels per gene. The datasets (from left to right): 10 time points dataset (basic), non-uniformly sampled dataset (early) and six time points dataset (six time points), all three with with increasing levels of parameter noise (variation in the clusters) and constant sampling noise. Compared methods are DynaMiteC (DYNA), K-means with Euclidean distance (Euclidian), and K-means with Pearson correlation (Correlation). (E) As in (D), but on the basic dataset with increasing levels of sampling noise and constant parameter noise.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3117368&req=5

Figure 2: Evaluation on Synthetic Data. (A) Illustration of our three training datasets and our test set. Each horizontal line shows the time points in which the data was sampled, in each dataset. The purple vertical lines show the time points used as test data. (B) Median-squared fit error for test values across increasing sampling noise levels, as measured on the three different datasets: 10 time points dataset (basic), non-uniformly sampled dataset (early) and six time points dataset (six time points). Comparing DynaMiteC (DYNA) to our impulse model with no priors (no priors), and to the model of Chechik et al. (impulse). (C) As in (B), but showing the median squared error in the predicted parameters h1 (left) and t1 (right) compared to the true parameters, both on the basic dataset. (D) mutual information between the predicted and true clustering labels per gene. The datasets (from left to right): 10 time points dataset (basic), non-uniformly sampled dataset (early) and six time points dataset (six time points), all three with with increasing levels of parameter noise (variation in the clusters) and constant sampling noise. Compared methods are DynaMiteC (DYNA), K-means with Euclidean distance (Euclidian), and K-means with Pearson correlation (Correlation). (E) As in (D), but on the basic dataset with increasing levels of sampling noise and constant parameter noise.
Mentions: Starting from cluster prototypes we generated data using two types of variation. First, the amount of variation within each cluster: how much the parameters of each gene deviate from the prototype it was generated from (using multiplicative Gaussian noise). Second, the noise in the data: how much the observed log-ratio expression value differs from the actual value for the gene (using additive Gaussian noise). From each prototype we created a set of 110–150 genes sampled across different time series (Fig. 2a), for each such gene both the model parameters and its cluster (according to the original prototype) are known. In this setting, we can test both our modeling method (prediction on new time points) and clustering method.Fig. 2.

Bottom Line: Moreover, our approach provides an easy way to compare between responses to different stimuli at the dynamical level.We use our approach to analyze the dynamical transcriptional responses to inflammation and anti-viral stimuli in mice primary dendritic cells, and extract a concise representation of the different dynamical response types.We analyze the similarities and differences between the two stimuli and identify potential regulators of this complex transcriptional response.

View Article: PubMed Central - PubMed

Affiliation: School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 91904, Israel.

ABSTRACT

Motivation: The precise dynamics of gene expression is often crucial for proper response to stimuli. Time-course gene-expression profiles can provide insights about the dynamics of many cellular responses, but are often noisy and measured at arbitrary intervals, posing a major analysis challenge.

Results: We developed an algorithm that interleaves clustering time-course gene-expression data with estimation of dynamic models of their response by biologically meaningful parameters. In combining these two tasks we overcome obstacles posed in each one. Moreover, our approach provides an easy way to compare between responses to different stimuli at the dynamical level. We use our approach to analyze the dynamical transcriptional responses to inflammation and anti-viral stimuli in mice primary dendritic cells, and extract a concise representation of the different dynamical response types. We analyze the similarities and differences between the two stimuli and identify potential regulators of this complex transcriptional response.

Availability: The code to our method is freely available http://www.compbio.cs.huji.ac.il/DynaMiteC.

Contact: nir@cs.huji.ac.il.

Show MeSH