Limits...
Computation of significance scores of unweighted Gene Set Enrichment Analyses.

Keller A, Backes C, Lenhof HP - BMC Bioinformatics (2007)

Bottom Line: Gene Set Enrichment Analysis (GSEA) is a computational method for the statistical evaluation of sorted lists of genes or proteins.Originally GSEA was developed for interpreting microarray gene expression data, but it can be applied to any sorted list of genes.Usually, significance scores (p-values) of GSEA are computed by nonparametric permutation tests, a time consuming procedure that yields only estimates of the p-values.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Bioinformatics, Saarland University, Building E1 1, 66804 Saarbrücken, Germany. ack@bioinf.uni-sb.de

ABSTRACT

Background: Gene Set Enrichment Analysis (GSEA) is a computational method for the statistical evaluation of sorted lists of genes or proteins. Originally GSEA was developed for interpreting microarray gene expression data, but it can be applied to any sorted list of genes. Given the gene list and an arbitrary biological category, GSEA evaluates whether the genes of the considered category are randomly distributed or accumulated on top or bottom of the list. Usually, significance scores (p-values) of GSEA are computed by nonparametric permutation tests, a time consuming procedure that yields only estimates of the p-values.

Results: We present a novel dynamic programming algorithm for calculating exact significance values of unweighted Gene Set Enrichment Analyses. Our algorithm avoids typical problems of nonparametric permutation tests, as varying findings in different runs caused by the random sampling procedure. Another advantage of the presented dynamic programming algorithm is its runtime and memory efficiency. To test our algorithm, we applied it not only to simulated data sets, but additionally evaluated expression profiles of squamous cell lung cancer tissue and autologous unaffected tissue.

Show MeSH

Related in: MedlinePlus

Dynamic Programming Matrix. The figure shows the dynamic programming matrix for the example provided in Figure 1. Matrix entries that are unequal zero are shaded. The yellow matrix entries do not have to be computed due to the extended side constraints, the number of running sum statistics with a smaller deviation of zero (RSC value) than 12 amounts to 54.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1994690&req=5

Figure 3: Dynamic Programming Matrix. The figure shows the dynamic programming matrix for the example provided in Figure 1. Matrix entries that are unequal zero are shaded. The yellow matrix entries do not have to be computed due to the extended side constraints, the number of running sum statistics with a smaller deviation of zero (RSC value) than 12 amounts to 54.

Mentions: Obviously, the first column M(·, 0) contains only one number unequal to zero, the second column two numbers, and the lth column l values unequal to zero (see Figure 1 and Figure 3). By using two standard STL hash maps instead of the matrix M, the space requirements can be reduced to O(l) and the expected runtime can be reduced to O(ml).


Computation of significance scores of unweighted Gene Set Enrichment Analyses.

Keller A, Backes C, Lenhof HP - BMC Bioinformatics (2007)

Dynamic Programming Matrix. The figure shows the dynamic programming matrix for the example provided in Figure 1. Matrix entries that are unequal zero are shaded. The yellow matrix entries do not have to be computed due to the extended side constraints, the number of running sum statistics with a smaller deviation of zero (RSC value) than 12 amounts to 54.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1994690&req=5

Figure 3: Dynamic Programming Matrix. The figure shows the dynamic programming matrix for the example provided in Figure 1. Matrix entries that are unequal zero are shaded. The yellow matrix entries do not have to be computed due to the extended side constraints, the number of running sum statistics with a smaller deviation of zero (RSC value) than 12 amounts to 54.
Mentions: Obviously, the first column M(·, 0) contains only one number unequal to zero, the second column two numbers, and the lth column l values unequal to zero (see Figure 1 and Figure 3). By using two standard STL hash maps instead of the matrix M, the space requirements can be reduced to O(l) and the expected runtime can be reduced to O(ml).

Bottom Line: Gene Set Enrichment Analysis (GSEA) is a computational method for the statistical evaluation of sorted lists of genes or proteins.Originally GSEA was developed for interpreting microarray gene expression data, but it can be applied to any sorted list of genes.Usually, significance scores (p-values) of GSEA are computed by nonparametric permutation tests, a time consuming procedure that yields only estimates of the p-values.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Bioinformatics, Saarland University, Building E1 1, 66804 Saarbrücken, Germany. ack@bioinf.uni-sb.de

ABSTRACT

Background: Gene Set Enrichment Analysis (GSEA) is a computational method for the statistical evaluation of sorted lists of genes or proteins. Originally GSEA was developed for interpreting microarray gene expression data, but it can be applied to any sorted list of genes. Given the gene list and an arbitrary biological category, GSEA evaluates whether the genes of the considered category are randomly distributed or accumulated on top or bottom of the list. Usually, significance scores (p-values) of GSEA are computed by nonparametric permutation tests, a time consuming procedure that yields only estimates of the p-values.

Results: We present a novel dynamic programming algorithm for calculating exact significance values of unweighted Gene Set Enrichment Analyses. Our algorithm avoids typical problems of nonparametric permutation tests, as varying findings in different runs caused by the random sampling procedure. Another advantage of the presented dynamic programming algorithm is its runtime and memory efficiency. To test our algorithm, we applied it not only to simulated data sets, but additionally evaluated expression profiles of squamous cell lung cancer tissue and autologous unaffected tissue.

Show MeSH
Related in: MedlinePlus