Limits...
Comparison of pattern detection methods in microarray time series of the segmentation clock.

Dequéant ML, Ahnert S, Edelsbrunner H, Fink TM, Glynn EF, Hattem G, Kudlicki A, Mileyko Y, Morton J, Mushegian AR, Pachter L, Rowicka M, Shiu A, Sturmfels B, Pourquié O - PLoS ONE (2008)

Bottom Line: A microarray time series was recently generated to study the transcriptional program of the mouse segmentation clock, a biological oscillator associated with the periodic formation of the segments of the body axis.Here, we applied to the same microarray time series dataset four distinct mathematical methods to identify significant patterns in gene expression profiles.These methods are called: Phase consistency, Address reduction, Cyclohedron test and Stable persistence, and are based on different conceptual frameworks that are either hypothesis- or data-driven.

View Article: PubMed Central - PubMed

Affiliation: Stowers Institute for Medical Research, Kansas City, Missouri, United States of America.

ABSTRACT
While genome-wide gene expression data are generated at an increasing rate, the repertoire of approaches for pattern discovery in these data is still limited. Identifying subtle patterns of interest in large amounts of data (tens of thousands of profiles) associated with a certain level of noise remains a challenge. A microarray time series was recently generated to study the transcriptional program of the mouse segmentation clock, a biological oscillator associated with the periodic formation of the segments of the body axis. A method related to Fourier analysis, the Lomb-Scargle periodogram, was used to detect periodic profiles in the dataset, leading to the identification of a novel set of cyclic genes associated with the segmentation clock. Here, we applied to the same microarray time series dataset four distinct mathematical methods to identify significant patterns in gene expression profiles. These methods are called: Phase consistency, Address reduction, Cyclohedron test and Stable persistence, and are based on different conceptual frameworks that are either hypothesis- or data-driven. Some of the methods, unlike Fourier transforms, are not dependent on the assumption of periodicity of the pattern of interest. Remarkably, these methods identified blindly the expression profiles of known cyclic genes as the most significant patterns in the dataset. Many candidate genes predicted by more than one approach appeared to be true positive cyclic genes and will be of particular interest for future research. In addition, these methods predicted novel candidate cyclic genes that were consistent with previous biological knowledge and experimental validation in mouse embryos. Our results demonstrate the utility of these novel pattern detection strategies, notably for detection of periodic profiles, and suggest that combining several distinct mathematical approaches to analyze microarray datasets is a valuable strategy for identifying genes that exhibit novel, interesting transcriptional patterns.

Show MeSH

Related in: MedlinePlus

Address reduction.The tree representations of the block structures for the blocking functions γΔ1 (left) and γ+− (right) (γ+− is the number of permutations with a given sequence of rises and falls [14]). Locating a given curve using the two-part address is equivalent to starting at the centre of the tree (A) and finding a particular exit at the edge (e.g., B). Address reduction μA gives the reduction in information, measured in bits, to get from A to some B, compared to the information needed to locate B explicitly. In the case above, the endpoint B, being in a block of four, could correspond to the permutation (4, 5, 3, 2, 1) (this and three other permutations have γΔ1  =  5). To find it, someone starting at A would require log2 8+log2 4 = 5 bits of information (8 paths to choose from, then 4 paths to choose from), which is μA  =  1.91 bits less than that required to transmit (4, 5, 3, 2, 1) explicitly, namely, log2 5!.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2481401&req=5

pone-0002856-g006: Address reduction.The tree representations of the block structures for the blocking functions γΔ1 (left) and γ+− (right) (γ+− is the number of permutations with a given sequence of rises and falls [14]). Locating a given curve using the two-part address is equivalent to starting at the centre of the tree (A) and finding a particular exit at the edge (e.g., B). Address reduction μA gives the reduction in information, measured in bits, to get from A to some B, compared to the information needed to locate B explicitly. In the case above, the endpoint B, being in a block of four, could correspond to the permutation (4, 5, 3, 2, 1) (this and three other permutations have γΔ1  =  5). To find it, someone starting at A would require log2 8+log2 4 = 5 bits of information (8 paths to choose from, then 4 paths to choose from), which is μA  =  1.91 bits less than that required to transmit (4, 5, 3, 2, 1) explicitly, namely, log2 5!.

Mentions: Mathematical description—We first partition the space of permutations into blocks using some blocking function γA that maps each permutation to a real number; permutations with the same number are in the same block. Second, we base the measure of an expression profile f on the size of the block that contains it, and the total number of possible blocks. Then the number of bits μA(f) that f can be compressed iswhere is the set of permutations mapping to the same block as π(f) and Im(γA), the image of γA, is the set of possible values that the blocking function can take on. The vertical bars denote the number of elements in the set shown between them. Subtracting the logarithm of the image size allows comparison between different blocking functions and curves with different numbers of data points, which we do not consider here. In the application to the mouse embryo data, we use what is sometimes called the bounded variation, , where πi is the rank of fi in the sorted order (see Figure 6 left for an example). Other blocking functions can be used: see Figure 1 right and [31], as well as the discussions of the methods C and S. Further details about address reduction can be found in [14], [31].


Comparison of pattern detection methods in microarray time series of the segmentation clock.

Dequéant ML, Ahnert S, Edelsbrunner H, Fink TM, Glynn EF, Hattem G, Kudlicki A, Mileyko Y, Morton J, Mushegian AR, Pachter L, Rowicka M, Shiu A, Sturmfels B, Pourquié O - PLoS ONE (2008)

Address reduction.The tree representations of the block structures for the blocking functions γΔ1 (left) and γ+− (right) (γ+− is the number of permutations with a given sequence of rises and falls [14]). Locating a given curve using the two-part address is equivalent to starting at the centre of the tree (A) and finding a particular exit at the edge (e.g., B). Address reduction μA gives the reduction in information, measured in bits, to get from A to some B, compared to the information needed to locate B explicitly. In the case above, the endpoint B, being in a block of four, could correspond to the permutation (4, 5, 3, 2, 1) (this and three other permutations have γΔ1  =  5). To find it, someone starting at A would require log2 8+log2 4 = 5 bits of information (8 paths to choose from, then 4 paths to choose from), which is μA  =  1.91 bits less than that required to transmit (4, 5, 3, 2, 1) explicitly, namely, log2 5!.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2481401&req=5

pone-0002856-g006: Address reduction.The tree representations of the block structures for the blocking functions γΔ1 (left) and γ+− (right) (γ+− is the number of permutations with a given sequence of rises and falls [14]). Locating a given curve using the two-part address is equivalent to starting at the centre of the tree (A) and finding a particular exit at the edge (e.g., B). Address reduction μA gives the reduction in information, measured in bits, to get from A to some B, compared to the information needed to locate B explicitly. In the case above, the endpoint B, being in a block of four, could correspond to the permutation (4, 5, 3, 2, 1) (this and three other permutations have γΔ1  =  5). To find it, someone starting at A would require log2 8+log2 4 = 5 bits of information (8 paths to choose from, then 4 paths to choose from), which is μA  =  1.91 bits less than that required to transmit (4, 5, 3, 2, 1) explicitly, namely, log2 5!.
Mentions: Mathematical description—We first partition the space of permutations into blocks using some blocking function γA that maps each permutation to a real number; permutations with the same number are in the same block. Second, we base the measure of an expression profile f on the size of the block that contains it, and the total number of possible blocks. Then the number of bits μA(f) that f can be compressed iswhere is the set of permutations mapping to the same block as π(f) and Im(γA), the image of γA, is the set of possible values that the blocking function can take on. The vertical bars denote the number of elements in the set shown between them. Subtracting the logarithm of the image size allows comparison between different blocking functions and curves with different numbers of data points, which we do not consider here. In the application to the mouse embryo data, we use what is sometimes called the bounded variation, , where πi is the rank of fi in the sorted order (see Figure 6 left for an example). Other blocking functions can be used: see Figure 1 right and [31], as well as the discussions of the methods C and S. Further details about address reduction can be found in [14], [31].

Bottom Line: A microarray time series was recently generated to study the transcriptional program of the mouse segmentation clock, a biological oscillator associated with the periodic formation of the segments of the body axis.Here, we applied to the same microarray time series dataset four distinct mathematical methods to identify significant patterns in gene expression profiles.These methods are called: Phase consistency, Address reduction, Cyclohedron test and Stable persistence, and are based on different conceptual frameworks that are either hypothesis- or data-driven.

View Article: PubMed Central - PubMed

Affiliation: Stowers Institute for Medical Research, Kansas City, Missouri, United States of America.

ABSTRACT
While genome-wide gene expression data are generated at an increasing rate, the repertoire of approaches for pattern discovery in these data is still limited. Identifying subtle patterns of interest in large amounts of data (tens of thousands of profiles) associated with a certain level of noise remains a challenge. A microarray time series was recently generated to study the transcriptional program of the mouse segmentation clock, a biological oscillator associated with the periodic formation of the segments of the body axis. A method related to Fourier analysis, the Lomb-Scargle periodogram, was used to detect periodic profiles in the dataset, leading to the identification of a novel set of cyclic genes associated with the segmentation clock. Here, we applied to the same microarray time series dataset four distinct mathematical methods to identify significant patterns in gene expression profiles. These methods are called: Phase consistency, Address reduction, Cyclohedron test and Stable persistence, and are based on different conceptual frameworks that are either hypothesis- or data-driven. Some of the methods, unlike Fourier transforms, are not dependent on the assumption of periodicity of the pattern of interest. Remarkably, these methods identified blindly the expression profiles of known cyclic genes as the most significant patterns in the dataset. Many candidate genes predicted by more than one approach appeared to be true positive cyclic genes and will be of particular interest for future research. In addition, these methods predicted novel candidate cyclic genes that were consistent with previous biological knowledge and experimental validation in mouse embryos. Our results demonstrate the utility of these novel pattern detection strategies, notably for detection of periodic profiles, and suggest that combining several distinct mathematical approaches to analyze microarray datasets is a valuable strategy for identifying genes that exhibit novel, interesting transcriptional patterns.

Show MeSH
Related in: MedlinePlus