Limits...
SW1PerS: Sliding windows and 1-persistence scoring; discovering periodicity in gene expression time series data.

Perea JA, Deckard A, Haase SB, Harer J - BMC Bioinformatics (2015)

Bottom Line: Results on biological data are also analyzed and compared.In data from the Yeast cell cycle SW1PerS identifies genes not preferred by other algorithms, hence not previously reported as periodic, but found in other experiments such as the universal growth rate response of Slavov.Indeed, by having an initial set of periodic genes with a rich variety of signal types, pattern/shape information can be included in the study of systems and the generation of hypotheses regarding the structure of gene regulatory networks.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics, Duke University, Science Dr, Durham, 27708, NC, USA. joperea@math.duke.edu.

ABSTRACT

Background: Identifying periodically expressed genes across different processes (e.g. the cell and metabolic cycles, circadian rhythms, etc) is a central problem in computational biology. Biological time series may contain (multiple) unknown signal shapes of systemic relevance, imperfections like noise, damping, and trending, or limited sampling density. While there exist methods for detecting periodicity, their design biases (e.g. toward a specific signal shape) can limit their applicability in one or more of these situations.

Methods: We present in this paper a novel method, SW1PerS, for quantifying periodicity in time series in a shape-agnostic manner and with resistance to damping. The measurement is performed directly, without presupposing a particular pattern, by evaluating the circularity of a high-dimensional representation of the signal. SW1PerS is compared to other algorithms using synthetic data and performance is quantified under varying noise models, noise levels, sampling densities, and signal shapes. Results on biological data are also analyzed and compared.

Results: On the task of periodic/not-periodic classification, using synthetic data, SW1PerS outperforms all other algorithms in the low-noise regime. SW1PerS is shown to be the most shape-agnostic of the evaluated methods, and the only one to consistently classify damped signals as highly periodic. On biological data, and for several experiments, the lists of top 10% genes ranked with SW1PerS recover up to 67% of those generated with other popular algorithms. Moreover, the list of genes from data on the Yeast metabolic cycle which are highly-ranked only by SW1PerS, contains evidently non-cosine patterns (e.g. ECM33, CDC9, SAM1,2 and MSH6) with highly periodic expression profiles. In data from the Yeast cell cycle SW1PerS identifies genes not preferred by other algorithms, hence not previously reported as periodic, but found in other experiments such as the universal growth rate response of Slavov. These genes are BOP3, CDC10, YIL108W, YER034W, MLP1, PAC2 and RTT101.

Conclusions: In biological systems with low noise, i.e. where periodic signals with interesting shapes are more likely to occur, SW1PerS can be used as a powerful tool in exploratory analyses. Indeed, by having an initial set of periodic genes with a rich variety of signal types, pattern/shape information can be included in the study of systems and the generation of hypotheses regarding the structure of gene regulatory networks.

No MeSH data available.


Biases for curve shapes for each algorithm (rows). Distributions of scores are by shape with no noise (Gaussian noise SD=0). The x-axis shows the log of the scores, ranging from the lowest (best score) to the highest (worst score) returned by the algorithm. The y-axis shows the number of signals receiving the score. Please refer to an electronic version for colors
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4537550&req=5

Fig4: Biases for curve shapes for each algorithm (rows). Distributions of scores are by shape with no noise (Gaussian noise SD=0). The x-axis shows the log of the scores, ranging from the lowest (best score) to the highest (worst score) returned by the algorithm. The y-axis shows the number of signals receiving the score. Please refer to an electronic version for colors

Mentions: In our second analysis, we examined how biased each algorithm was toward each signal shape. This can be visualized by plotting the distribution, as a histogram, of periodicity scores for all instances of all signal shapes in the synthetic data (Fig. 4). When one shape consistently receives better scores than all others, the algorithm is biased towards this shape. For JTK and LS, we can see a strong bias for cosine signals, which receive the best scores (Fig. 4). DL groups most exemplars at an intermediate level, except for peak2 and contracting signals which receive worse scores, and the trended signals which are distributed across a wide range. For SW1PerS, there is a mixture of cosine, cosine 2, cosine damped, and square signals near the top of the rankings. These are followed closely by peaked and sawtooth signals. The plots of score distributions for each algorithm, number of samples, noise level, and shape can be seen in the supplement (Figures S15-S34). As the noise level increases, these divisions by shape become further blurred. In summary, SW1PerS is the method with the most shape variation for signals scored as highly periodic, and the only one to include damped shapes at the top of its rankings.Fig. 4


SW1PerS: Sliding windows and 1-persistence scoring; discovering periodicity in gene expression time series data.

Perea JA, Deckard A, Haase SB, Harer J - BMC Bioinformatics (2015)

Biases for curve shapes for each algorithm (rows). Distributions of scores are by shape with no noise (Gaussian noise SD=0). The x-axis shows the log of the scores, ranging from the lowest (best score) to the highest (worst score) returned by the algorithm. The y-axis shows the number of signals receiving the score. Please refer to an electronic version for colors
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4537550&req=5

Fig4: Biases for curve shapes for each algorithm (rows). Distributions of scores are by shape with no noise (Gaussian noise SD=0). The x-axis shows the log of the scores, ranging from the lowest (best score) to the highest (worst score) returned by the algorithm. The y-axis shows the number of signals receiving the score. Please refer to an electronic version for colors
Mentions: In our second analysis, we examined how biased each algorithm was toward each signal shape. This can be visualized by plotting the distribution, as a histogram, of periodicity scores for all instances of all signal shapes in the synthetic data (Fig. 4). When one shape consistently receives better scores than all others, the algorithm is biased towards this shape. For JTK and LS, we can see a strong bias for cosine signals, which receive the best scores (Fig. 4). DL groups most exemplars at an intermediate level, except for peak2 and contracting signals which receive worse scores, and the trended signals which are distributed across a wide range. For SW1PerS, there is a mixture of cosine, cosine 2, cosine damped, and square signals near the top of the rankings. These are followed closely by peaked and sawtooth signals. The plots of score distributions for each algorithm, number of samples, noise level, and shape can be seen in the supplement (Figures S15-S34). As the noise level increases, these divisions by shape become further blurred. In summary, SW1PerS is the method with the most shape variation for signals scored as highly periodic, and the only one to include damped shapes at the top of its rankings.Fig. 4

Bottom Line: Results on biological data are also analyzed and compared.In data from the Yeast cell cycle SW1PerS identifies genes not preferred by other algorithms, hence not previously reported as periodic, but found in other experiments such as the universal growth rate response of Slavov.Indeed, by having an initial set of periodic genes with a rich variety of signal types, pattern/shape information can be included in the study of systems and the generation of hypotheses regarding the structure of gene regulatory networks.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics, Duke University, Science Dr, Durham, 27708, NC, USA. joperea@math.duke.edu.

ABSTRACT

Background: Identifying periodically expressed genes across different processes (e.g. the cell and metabolic cycles, circadian rhythms, etc) is a central problem in computational biology. Biological time series may contain (multiple) unknown signal shapes of systemic relevance, imperfections like noise, damping, and trending, or limited sampling density. While there exist methods for detecting periodicity, their design biases (e.g. toward a specific signal shape) can limit their applicability in one or more of these situations.

Methods: We present in this paper a novel method, SW1PerS, for quantifying periodicity in time series in a shape-agnostic manner and with resistance to damping. The measurement is performed directly, without presupposing a particular pattern, by evaluating the circularity of a high-dimensional representation of the signal. SW1PerS is compared to other algorithms using synthetic data and performance is quantified under varying noise models, noise levels, sampling densities, and signal shapes. Results on biological data are also analyzed and compared.

Results: On the task of periodic/not-periodic classification, using synthetic data, SW1PerS outperforms all other algorithms in the low-noise regime. SW1PerS is shown to be the most shape-agnostic of the evaluated methods, and the only one to consistently classify damped signals as highly periodic. On biological data, and for several experiments, the lists of top 10% genes ranked with SW1PerS recover up to 67% of those generated with other popular algorithms. Moreover, the list of genes from data on the Yeast metabolic cycle which are highly-ranked only by SW1PerS, contains evidently non-cosine patterns (e.g. ECM33, CDC9, SAM1,2 and MSH6) with highly periodic expression profiles. In data from the Yeast cell cycle SW1PerS identifies genes not preferred by other algorithms, hence not previously reported as periodic, but found in other experiments such as the universal growth rate response of Slavov. These genes are BOP3, CDC10, YIL108W, YER034W, MLP1, PAC2 and RTT101.

Conclusions: In biological systems with low noise, i.e. where periodic signals with interesting shapes are more likely to occur, SW1PerS can be used as a powerful tool in exploratory analyses. Indeed, by having an initial set of periodic genes with a rich variety of signal types, pattern/shape information can be included in the study of systems and the generation of hypotheses regarding the structure of gene regulatory networks.

No MeSH data available.