Limits...
Network-based segmentation of biological multivariate time series.

Omranian N, Klie S, Mueller-Roeber B, Nikoloski Z - PLoS ONE (2013)

Bottom Line: As a result, MTS data capture the dynamics of biochemical processes and components whose couplings may involve different scales and exhibit temporal changes.We demonstrate that the problem of partitioning MTS data into [Formula: see text] segments to maximize a distance function, operating on polynomially computable network properties, often used in analysis of biological network, can be efficiently solved.To enable biological interpretation, we also propose a breakpoint-penalty (BP-penalty) formulation for determining MTS segmentation which combines a distance function with the number/length of segments.

View Article: PubMed Central - PubMed

Affiliation: Institute of Biochemistry and Biology, University of Potsdam, Potsdam-Golm, Germany.

ABSTRACT
Molecular phenotyping technologies (e.g., transcriptomics, proteomics, and metabolomics) offer the possibility to simultaneously obtain multivariate time series (MTS) data from different levels of information processing and metabolic conversions in biological systems. As a result, MTS data capture the dynamics of biochemical processes and components whose couplings may involve different scales and exhibit temporal changes. Therefore, it is important to develop methods for determining the time segments in MTS data, which may correspond to critical biochemical events reflected in the coupling of the system's components. Here we provide a novel network-based formalization of the MTS segmentation problem based on temporal dependencies and the covariance structure of the data. We demonstrate that the problem of partitioning MTS data into [Formula: see text] segments to maximize a distance function, operating on polynomially computable network properties, often used in analysis of biological network, can be efficiently solved. To enable biological interpretation, we also propose a breakpoint-penalty (BP-penalty) formulation for determining MTS segmentation which combines a distance function with the number/length of segments. Our empirical analyses of synthetic benchmark data as well as time-resolved transcriptomics data from the metabolic and cell cycles of Saccharomyces cerevisiae demonstrate that the proposed method accurately infers the phases in the temporal compartmentalization of biological processes. In addition, through comparison on the same data sets, we show that the results from the proposed formalization of the MTS segmentation problem match biological knowledge and provide more rigorous statistical support in comparison to the contending state-of-the-art methods.

Show MeSH
Illustration of the segmentation for synthetic data with relative density as network property.The resulting partitions are highlighted in light grey and the simulated segmentation points are marked with red bars.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3646968&req=5

pone-0062974-g004: Illustration of the segmentation for synthetic data with relative density as network property.The resulting partitions are highlighted in light grey and the simulated segmentation points are marked with red bars.

Mentions: To investigate the performance of the algorithm, we created synthetic time series data for 70 variables over 36 time points (see Fig. 4). The segmentation points correspond to the time points 7, 12 and 21. To create these segmentation points, a number of data profiles were generated for each segment by simulating a zero-mean autoregressive moving average (ARIMA) model by using arima.sim in R [35]. The number of profiles simulated for the four segments, [1], [7], [8], [12], [13], [21], [22], [36], was set to 2, 6, 3, and 7, respectively. Each of the 70 variables was obtained by randomly sampling a characteristic data profile in each segment. In addition, a normally distributed error term, , was added to the sampled profile value at each time point. Finally, to simulate the temporal dependence between two adjacent segments, the boundaries between two segments of each variable were smoothed using a discrete linear filter approximating a Gaussian kernel. To this end, for each obtained profile, the simulated measurement at time-point , where is the left boundary of each segment, i.e., , was replaced by .


Network-based segmentation of biological multivariate time series.

Omranian N, Klie S, Mueller-Roeber B, Nikoloski Z - PLoS ONE (2013)

Illustration of the segmentation for synthetic data with relative density as network property.The resulting partitions are highlighted in light grey and the simulated segmentation points are marked with red bars.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3646968&req=5

pone-0062974-g004: Illustration of the segmentation for synthetic data with relative density as network property.The resulting partitions are highlighted in light grey and the simulated segmentation points are marked with red bars.
Mentions: To investigate the performance of the algorithm, we created synthetic time series data for 70 variables over 36 time points (see Fig. 4). The segmentation points correspond to the time points 7, 12 and 21. To create these segmentation points, a number of data profiles were generated for each segment by simulating a zero-mean autoregressive moving average (ARIMA) model by using arima.sim in R [35]. The number of profiles simulated for the four segments, [1], [7], [8], [12], [13], [21], [22], [36], was set to 2, 6, 3, and 7, respectively. Each of the 70 variables was obtained by randomly sampling a characteristic data profile in each segment. In addition, a normally distributed error term, , was added to the sampled profile value at each time point. Finally, to simulate the temporal dependence between two adjacent segments, the boundaries between two segments of each variable were smoothed using a discrete linear filter approximating a Gaussian kernel. To this end, for each obtained profile, the simulated measurement at time-point , where is the left boundary of each segment, i.e., , was replaced by .

Bottom Line: As a result, MTS data capture the dynamics of biochemical processes and components whose couplings may involve different scales and exhibit temporal changes.We demonstrate that the problem of partitioning MTS data into [Formula: see text] segments to maximize a distance function, operating on polynomially computable network properties, often used in analysis of biological network, can be efficiently solved.To enable biological interpretation, we also propose a breakpoint-penalty (BP-penalty) formulation for determining MTS segmentation which combines a distance function with the number/length of segments.

View Article: PubMed Central - PubMed

Affiliation: Institute of Biochemistry and Biology, University of Potsdam, Potsdam-Golm, Germany.

ABSTRACT
Molecular phenotyping technologies (e.g., transcriptomics, proteomics, and metabolomics) offer the possibility to simultaneously obtain multivariate time series (MTS) data from different levels of information processing and metabolic conversions in biological systems. As a result, MTS data capture the dynamics of biochemical processes and components whose couplings may involve different scales and exhibit temporal changes. Therefore, it is important to develop methods for determining the time segments in MTS data, which may correspond to critical biochemical events reflected in the coupling of the system's components. Here we provide a novel network-based formalization of the MTS segmentation problem based on temporal dependencies and the covariance structure of the data. We demonstrate that the problem of partitioning MTS data into [Formula: see text] segments to maximize a distance function, operating on polynomially computable network properties, often used in analysis of biological network, can be efficiently solved. To enable biological interpretation, we also propose a breakpoint-penalty (BP-penalty) formulation for determining MTS segmentation which combines a distance function with the number/length of segments. Our empirical analyses of synthetic benchmark data as well as time-resolved transcriptomics data from the metabolic and cell cycles of Saccharomyces cerevisiae demonstrate that the proposed method accurately infers the phases in the temporal compartmentalization of biological processes. In addition, through comparison on the same data sets, we show that the results from the proposed formalization of the MTS segmentation problem match biological knowledge and provide more rigorous statistical support in comparison to the contending state-of-the-art methods.

Show MeSH