Limits...
Comparison of pattern detection methods in microarray time series of the segmentation clock.

Dequéant ML, Ahnert S, Edelsbrunner H, Fink TM, Glynn EF, Hattem G, Kudlicki A, Mileyko Y, Morton J, Mushegian AR, Pachter L, Rowicka M, Shiu A, Sturmfels B, Pourquié O - PLoS ONE (2008)

Bottom Line: A microarray time series was recently generated to study the transcriptional program of the mouse segmentation clock, a biological oscillator associated with the periodic formation of the segments of the body axis.Here, we applied to the same microarray time series dataset four distinct mathematical methods to identify significant patterns in gene expression profiles.These methods are called: Phase consistency, Address reduction, Cyclohedron test and Stable persistence, and are based on different conceptual frameworks that are either hypothesis- or data-driven.

View Article: PubMed Central - PubMed

Affiliation: Stowers Institute for Medical Research, Kansas City, Missouri, United States of America.

ABSTRACT
While genome-wide gene expression data are generated at an increasing rate, the repertoire of approaches for pattern discovery in these data is still limited. Identifying subtle patterns of interest in large amounts of data (tens of thousands of profiles) associated with a certain level of noise remains a challenge. A microarray time series was recently generated to study the transcriptional program of the mouse segmentation clock, a biological oscillator associated with the periodic formation of the segments of the body axis. A method related to Fourier analysis, the Lomb-Scargle periodogram, was used to detect periodic profiles in the dataset, leading to the identification of a novel set of cyclic genes associated with the segmentation clock. Here, we applied to the same microarray time series dataset four distinct mathematical methods to identify significant patterns in gene expression profiles. These methods are called: Phase consistency, Address reduction, Cyclohedron test and Stable persistence, and are based on different conceptual frameworks that are either hypothesis- or data-driven. Some of the methods, unlike Fourier transforms, are not dependent on the assumption of periodicity of the pattern of interest. Remarkably, these methods identified blindly the expression profiles of known cyclic genes as the most significant patterns in the dataset. Many candidate genes predicted by more than one approach appeared to be true positive cyclic genes and will be of particular interest for future research. In addition, these methods predicted novel candidate cyclic genes that were consistent with previous biological knowledge and experimental validation in mouse embryos. Our results demonstrate the utility of these novel pattern detection strategies, notably for detection of periodic profiles, and suggest that combining several distinct mathematical approaches to analyze microarray datasets is a valuable strategy for identifying genes that exhibit novel, interesting transcriptional patterns.

Show MeSH

Related in: MedlinePlus

Comparison of the intersection of the top 300 ranked probe sets from the five methods.(A) Venn diagram. (B) Haase diagram shows the pairwise intersection of two lists, the triple intersection of three lists, and so on. The total number of distinct probe sets in all of the five top 300 lists (the union) is 884; the total number in each of the five sets (the intersection) is 21. L, Lomb-Scargle analysis; P, Phase consistency; A, Address reduction; C, Cyclohedron test; S, Stable persistence.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2481401&req=5

pone-0002856-g002: Comparison of the intersection of the top 300 ranked probe sets from the five methods.(A) Venn diagram. (B) Haase diagram shows the pairwise intersection of two lists, the triple intersection of three lists, and so on. The total number of distinct probe sets in all of the five top 300 lists (the union) is 884; the total number in each of the five sets (the intersection) is 21. L, Lomb-Scargle analysis; P, Phase consistency; A, Address reduction; C, Cyclohedron test; S, Stable persistence.

Mentions: We next compared the intersection of the top 300 ranked probe sets from the four methods and method L. This is represented in Figure 2A as a five-set Venn diagram in which each color corresponds to a different method and in Figure 2B as a Haase diagram in the form of the lattice of the subsets of a five-element set. The total number of distinct probe sets in all of the five sets (the union) is 884; the total number in each of the five sets (the intersection) is 21. The overlap contains eight true positive cyclic genes (Supplementary Information, Table S7). Many candidate genes were identified by only one, two, three or four methods. The L, P and C methods identified larger numbers of unique genes (104, 160 and 154, respectively) compared to method A (67) and method S (47) (Figure 2). Although it is not possible to know whether all the uniquely predicted genes are associated with the segmentation clock, many of them are biologically plausible since they are associated with the Wnt pathway.


Comparison of pattern detection methods in microarray time series of the segmentation clock.

Dequéant ML, Ahnert S, Edelsbrunner H, Fink TM, Glynn EF, Hattem G, Kudlicki A, Mileyko Y, Morton J, Mushegian AR, Pachter L, Rowicka M, Shiu A, Sturmfels B, Pourquié O - PLoS ONE (2008)

Comparison of the intersection of the top 300 ranked probe sets from the five methods.(A) Venn diagram. (B) Haase diagram shows the pairwise intersection of two lists, the triple intersection of three lists, and so on. The total number of distinct probe sets in all of the five top 300 lists (the union) is 884; the total number in each of the five sets (the intersection) is 21. L, Lomb-Scargle analysis; P, Phase consistency; A, Address reduction; C, Cyclohedron test; S, Stable persistence.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2481401&req=5

pone-0002856-g002: Comparison of the intersection of the top 300 ranked probe sets from the five methods.(A) Venn diagram. (B) Haase diagram shows the pairwise intersection of two lists, the triple intersection of three lists, and so on. The total number of distinct probe sets in all of the five top 300 lists (the union) is 884; the total number in each of the five sets (the intersection) is 21. L, Lomb-Scargle analysis; P, Phase consistency; A, Address reduction; C, Cyclohedron test; S, Stable persistence.
Mentions: We next compared the intersection of the top 300 ranked probe sets from the four methods and method L. This is represented in Figure 2A as a five-set Venn diagram in which each color corresponds to a different method and in Figure 2B as a Haase diagram in the form of the lattice of the subsets of a five-element set. The total number of distinct probe sets in all of the five sets (the union) is 884; the total number in each of the five sets (the intersection) is 21. The overlap contains eight true positive cyclic genes (Supplementary Information, Table S7). Many candidate genes were identified by only one, two, three or four methods. The L, P and C methods identified larger numbers of unique genes (104, 160 and 154, respectively) compared to method A (67) and method S (47) (Figure 2). Although it is not possible to know whether all the uniquely predicted genes are associated with the segmentation clock, many of them are biologically plausible since they are associated with the Wnt pathway.

Bottom Line: A microarray time series was recently generated to study the transcriptional program of the mouse segmentation clock, a biological oscillator associated with the periodic formation of the segments of the body axis.Here, we applied to the same microarray time series dataset four distinct mathematical methods to identify significant patterns in gene expression profiles.These methods are called: Phase consistency, Address reduction, Cyclohedron test and Stable persistence, and are based on different conceptual frameworks that are either hypothesis- or data-driven.

View Article: PubMed Central - PubMed

Affiliation: Stowers Institute for Medical Research, Kansas City, Missouri, United States of America.

ABSTRACT
While genome-wide gene expression data are generated at an increasing rate, the repertoire of approaches for pattern discovery in these data is still limited. Identifying subtle patterns of interest in large amounts of data (tens of thousands of profiles) associated with a certain level of noise remains a challenge. A microarray time series was recently generated to study the transcriptional program of the mouse segmentation clock, a biological oscillator associated with the periodic formation of the segments of the body axis. A method related to Fourier analysis, the Lomb-Scargle periodogram, was used to detect periodic profiles in the dataset, leading to the identification of a novel set of cyclic genes associated with the segmentation clock. Here, we applied to the same microarray time series dataset four distinct mathematical methods to identify significant patterns in gene expression profiles. These methods are called: Phase consistency, Address reduction, Cyclohedron test and Stable persistence, and are based on different conceptual frameworks that are either hypothesis- or data-driven. Some of the methods, unlike Fourier transforms, are not dependent on the assumption of periodicity of the pattern of interest. Remarkably, these methods identified blindly the expression profiles of known cyclic genes as the most significant patterns in the dataset. Many candidate genes predicted by more than one approach appeared to be true positive cyclic genes and will be of particular interest for future research. In addition, these methods predicted novel candidate cyclic genes that were consistent with previous biological knowledge and experimental validation in mouse embryos. Our results demonstrate the utility of these novel pattern detection strategies, notably for detection of periodic profiles, and suggest that combining several distinct mathematical approaches to analyze microarray datasets is a valuable strategy for identifying genes that exhibit novel, interesting transcriptional patterns.

Show MeSH
Related in: MedlinePlus