DTW-MIC Coexpression Networks from Time-Course Data.
Bottom Line:
When modeling coexpression networks from high-throughput time course data, Pearson Correlation Coefficient (PCC) is one of the most effective and popular similarity functions.Here we propose to overcome these two issues by employing a novel similarity function, Dynamic Time Warping Maximal Information Coefficient (DTW-MIC), combining a measure taking care of functional interactions of signals (MIC) and a measure identifying time lag (DTW).By using the Hamming-Ipsen-Mikhailov (HIM) metric to quantify network differences, the effectiveness of the DTW-MIC approach is demonstrated on a set of four synthetic and one transcriptomic datasets, also in comparison to TimeDelay ARACNE and Transfer Entropy.
View Article:
PubMed Central - PubMed
Affiliation: Fondazione Bruno Kessler, Trento, Italy.
ABSTRACT
Show MeSH
When modeling coexpression networks from high-throughput time course data, Pearson Correlation Coefficient (PCC) is one of the most effective and popular similarity functions. However, its reliability is limited since it cannot capture non-linear interactions and time shifts. Here we propose to overcome these two issues by employing a novel similarity function, Dynamic Time Warping Maximal Information Coefficient (DTW-MIC), combining a measure taking care of functional interactions of signals (MIC) and a measure identifying time lag (DTW). By using the Hamming-Ipsen-Mikhailov (HIM) metric to quantify network differences, the effectiveness of the DTW-MIC approach is demonstrated on a set of four synthetic and one transcriptomic datasets, also in comparison to TimeDelay ARACNE and Transfer Entropy. |
Related In:
Results -
Collection
License getmorefigures.php?uid=PMC4816347&req=5
Mentions: Example In what follows, a synthetic example is used to highlight the difference between DTW and PCC for increasing time shift, with and without a moderate noise level. This example mimics a common situation in omics data, when the activation of a gene induces a delayed activation of an inactive gene, with a similar expression level curve, affected by a certain amount of noise. Consider the following time series with 100 time points {ti = i : 1 ≤ i ≤ 100}:r(i)=110e-225ii3sin(320i),whose graph is displayed in the top-left panel (yellow background) of Fig 2. Moreover, define the following family of time series originated by r(i), for :rs[k](i)={ε(k)fori≤sr(i-s)+ε(k·r(i-s))fors<i≤100.In this notation, . Finally, define the two functionsP:N×R0+→[-1,1]D:N×R0+→[0,1](s,k)↦PCC(rs[k],r)(s,k)↦DTWs(rs[k],r)In Fig 2 the plots of the 15 time series are shown, together with the corresponding values of P(s, k) (italic) and D(s, k) (boldface). Moreover, in the top panel of Fig 3 the curves P(s, k) (squares) and D(s, k) (dots) are displayed for k = 0, 1, 2 (in black, blue and red respectively) versus the time shift s ranging from 0 to 40. The example shows that DTW can model the dependence between and r(i), even for large time shift s and high noise level k. In particular, as a function of the time shift s, the value for DTW monotonically decreases from 1 to 0.959, 0.804, 0.670 for k = 0, 1, 2 respectively, and D(s, 0) > D(s, 1) > D(s, 2) consistently along the whole range 0 ≤ s ≤ 40. On the other hand, PCC rapidly decreases to very low correlation level even for small time shifts s > 5, with PCC < 0.3 for all values s > 7. Furthermore, the PCC value does not change monotonically on increasing noise: in fact, the curves P(s, k) mutually intersecate. Finally, to assess the significance of the values D(s, k), we compare it against the distribution , where the set consists of 2N random vectors ηj on 100 time points with values randomly and uniformly sampled between two positive real values m < M. In particular, as parameters here we use N = 1000 and, given a noise level k, we set and . For all the three cases k = 0, 1, 2, the distribution of the set is Gaussian shaped, and the 95% Student bootstrap confidence intervals around the mean are quite narrow, namely (0.7429, 0.7441), (0.6570, 0.6584) and (0.5115, 0.5130) for k = 0, 1, 2 respectively. Thus the mean values , i.e., 0.7435 (k = 0), 0.6577 (k = 1) and 0.5121 (k = 2), can be used as significance thresholds, as shown in the bottom panel of Fig 3: in all the three cases, for the whole range 0 ≤ s ≤ 40, the curve P(s, k) lies above the corresponding significance threshold value. |
View Article: PubMed Central - PubMed
Affiliation: Fondazione Bruno Kessler, Trento, Italy.