Limits...
From sets to graphs: towards a realistic enrichment analysis of transcriptomic systems.

Geistlinger L, Csaba G, Küffner R, Mulder N, Zimmer R - Bioinformatics (2011)

Bottom Line: Current gene set enrichment approaches do not take interactions and associations between set members into account.Firstly, GGEA improves the concordance of pairwise regulation with individual expression changes in respective pairs of regulating and regulated genes, compared with set enrichment methods.Ludwig.Geistlinger@bio.ifi.lmu.de; Ralf.Zimmer@bio.ifi.lmu.de.

View Article: PubMed Central - PubMed

Affiliation: Institute for Informatics, Ludwig-Maximilians-Universität Münchchen, Amalienstrasse 17, 80333 München, Germany. Ludwig.Geistlinger@bio.ifi.lmu.de

ABSTRACT

Motivation: Current gene set enrichment approaches do not take interactions and associations between set members into account. Mutual activation and inhibition causing positive and negative correlation among set members are thus neglected. As a consequence, inconsistent regulations and contextless expression changes are reported and, thus, the biological interpretation of the result is impeded.

Results: We analyzed established gene set enrichment methods and their result sets in a large-scale investigation of 1000 expression datasets. The reported statistically significant gene sets exhibit only average consistency between the observed patterns of differential expression and known regulatory interactions. We present Gene Graph Enrichment Analysis (GGEA) to detect consistently and coherently enriched gene sets, based on prior knowledge derived from directed gene regulatory networks. Firstly, GGEA improves the concordance of pairwise regulation with individual expression changes in respective pairs of regulating and regulated genes, compared with set enrichment methods. Secondly, GGEA yields result sets where a large fraction of relevant expression changes can be explained by nearby regulators, such as transcription factors, again improving on set-based methods. Thirdly, we demonstrate in additional case studies that GGEA can be applied to human regulatory pathways, where it sensitively detects very specific regulation processes, which are altered in tumors of the central nervous system. GGEA significantly increases the detection of gene sets where measured positively or negatively correlated expression patterns coincide with directed inducing or repressing relationships, thus facilitating further interpretation of gene expression data.

Availability: The method and accompanying visualization capabilities have been bundled into an R package and tied to a grahical user interface, the Galaxy workflow environment, that is running as a web server.

Contact: Ludwig.Geistlinger@bio.ifi.lmu.de; Ralf.Zimmer@bio.ifi.lmu.de.

Show MeSH

Related in: MedlinePlus

Fuzzyfication of (a) P-value and (b) fold change. Both measures are mapped onto two main categories, each having a membership function to express the uncertainty of the mapping. Additional categories, e.g. a third category medium and neutral, respectively, can be introduced for a more detailed representation.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3117393&req=5

Figure 2: Fuzzyfication of (a) P-value and (b) fold change. Both measures are mapped onto two main categories, each having a membership function to express the uncertainty of the mapping. Additional categories, e.g. a third category medium and neutral, respectively, can be introduced for a more detailed representation.

Mentions: the most intuitive measure for expression changes of a single gene between two conditions is the fold change(2)defined as the ratio of the estimated expression values of a particular gene in both sample groups(3)where expr(g,S) computes the mean expression level of gene g in condition S. We compute t-test derived P-values to assess the statistical significance of the expression changes (Pan, 2002) and correct them for multiple testing. Both measures are log transformed(4)and the significance thresholds α=−log(0.05) and β=1 (2−fold) are used as defaults for and , respectively. Such sharp thresholds are of course quite artificial and discriminate drastically between genes just over and just below α or β. In addition, noise in the data, such as imprecise and erroneous measurements of gene expression values, has to be expected and to be dealt with. Hence, we divide the range of both measures into two main categories and smooth the borders via introduction of a degree of uncertainty, according to the mathematical concept of fuzzyfication (Windhager and Zimmer, 2008; Windhager et al., 2010; Zadeh, 1963). For the fold change, we map(5)and compute membership values for both categories via the weighting functions (displayed in Fig. 2b), resulting in a pair(6)Analogously, we map , using Figure 2a, to areas of low and high significance in the fuzzy concept(7)For both measures, a third category can optionally be introduced to account for unspecific signals in case of very noisy data. The fold change and P-value categories are combined to a single measure of differential expression(8)in order to simultaneously summarize and express whether the transcriptional activity of a particular gene is reduced or enhanced in one sample group, compared with the other.Fig. 2.


From sets to graphs: towards a realistic enrichment analysis of transcriptomic systems.

Geistlinger L, Csaba G, Küffner R, Mulder N, Zimmer R - Bioinformatics (2011)

Fuzzyfication of (a) P-value and (b) fold change. Both measures are mapped onto two main categories, each having a membership function to express the uncertainty of the mapping. Additional categories, e.g. a third category medium and neutral, respectively, can be introduced for a more detailed representation.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3117393&req=5

Figure 2: Fuzzyfication of (a) P-value and (b) fold change. Both measures are mapped onto two main categories, each having a membership function to express the uncertainty of the mapping. Additional categories, e.g. a third category medium and neutral, respectively, can be introduced for a more detailed representation.
Mentions: the most intuitive measure for expression changes of a single gene between two conditions is the fold change(2)defined as the ratio of the estimated expression values of a particular gene in both sample groups(3)where expr(g,S) computes the mean expression level of gene g in condition S. We compute t-test derived P-values to assess the statistical significance of the expression changes (Pan, 2002) and correct them for multiple testing. Both measures are log transformed(4)and the significance thresholds α=−log(0.05) and β=1 (2−fold) are used as defaults for and , respectively. Such sharp thresholds are of course quite artificial and discriminate drastically between genes just over and just below α or β. In addition, noise in the data, such as imprecise and erroneous measurements of gene expression values, has to be expected and to be dealt with. Hence, we divide the range of both measures into two main categories and smooth the borders via introduction of a degree of uncertainty, according to the mathematical concept of fuzzyfication (Windhager and Zimmer, 2008; Windhager et al., 2010; Zadeh, 1963). For the fold change, we map(5)and compute membership values for both categories via the weighting functions (displayed in Fig. 2b), resulting in a pair(6)Analogously, we map , using Figure 2a, to areas of low and high significance in the fuzzy concept(7)For both measures, a third category can optionally be introduced to account for unspecific signals in case of very noisy data. The fold change and P-value categories are combined to a single measure of differential expression(8)in order to simultaneously summarize and express whether the transcriptional activity of a particular gene is reduced or enhanced in one sample group, compared with the other.Fig. 2.

Bottom Line: Current gene set enrichment approaches do not take interactions and associations between set members into account.Firstly, GGEA improves the concordance of pairwise regulation with individual expression changes in respective pairs of regulating and regulated genes, compared with set enrichment methods.Ludwig.Geistlinger@bio.ifi.lmu.de; Ralf.Zimmer@bio.ifi.lmu.de.

View Article: PubMed Central - PubMed

Affiliation: Institute for Informatics, Ludwig-Maximilians-Universität Münchchen, Amalienstrasse 17, 80333 München, Germany. Ludwig.Geistlinger@bio.ifi.lmu.de

ABSTRACT

Motivation: Current gene set enrichment approaches do not take interactions and associations between set members into account. Mutual activation and inhibition causing positive and negative correlation among set members are thus neglected. As a consequence, inconsistent regulations and contextless expression changes are reported and, thus, the biological interpretation of the result is impeded.

Results: We analyzed established gene set enrichment methods and their result sets in a large-scale investigation of 1000 expression datasets. The reported statistically significant gene sets exhibit only average consistency between the observed patterns of differential expression and known regulatory interactions. We present Gene Graph Enrichment Analysis (GGEA) to detect consistently and coherently enriched gene sets, based on prior knowledge derived from directed gene regulatory networks. Firstly, GGEA improves the concordance of pairwise regulation with individual expression changes in respective pairs of regulating and regulated genes, compared with set enrichment methods. Secondly, GGEA yields result sets where a large fraction of relevant expression changes can be explained by nearby regulators, such as transcription factors, again improving on set-based methods. Thirdly, we demonstrate in additional case studies that GGEA can be applied to human regulatory pathways, where it sensitively detects very specific regulation processes, which are altered in tumors of the central nervous system. GGEA significantly increases the detection of gene sets where measured positively or negatively correlated expression patterns coincide with directed inducing or repressing relationships, thus facilitating further interpretation of gene expression data.

Availability: The method and accompanying visualization capabilities have been bundled into an R package and tied to a grahical user interface, the Galaxy workflow environment, that is running as a web server.

Contact: Ludwig.Geistlinger@bio.ifi.lmu.de; Ralf.Zimmer@bio.ifi.lmu.de.

Show MeSH
Related in: MedlinePlus