Limits...
Testing gene set enrichment for subset of genes: Sub-GSE.

Yan X, Sun F - BMC Bioinformatics (2008)

Bottom Line: The results based on gene set analysis are generally more biologically interpretable, accurate and robust than the results based on individual gene analysis.This is particularly true for cases in which only a fraction of the genes in the gene set are associated with the phenotypes.Furthermore, the application of Sub-GSE to two real data sets demonstrates that it can detect more biologically meaningful gene sets than GSEA.

View Article: PubMed Central - HTML - PubMed

Affiliation: Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089-2910, USA. xitingya@usc.edu

ABSTRACT

Background: Many methods have been developed to test the enrichment of genes related to certain phenotypes or cell states in gene sets. These approaches usually combine gene expression data with functionally related gene sets as defined in databases such as GeneOntology (GO), KEGG, or BioCarta. The results based on gene set analysis are generally more biologically interpretable, accurate and robust than the results based on individual gene analysis. However, while most available methods for gene set enrichment analysis test the enrichment of the entire gene set, it is more likely that only a subset of the genes in the gene set may be related to the phenotypes of interest.

Results: In this paper, we develop a novel method, termed Sub-GSE, which measures the enrichment of a predefined gene set, or pathway, by testing its subsets. The application of Sub-GSE to two simulated and two real datasets shows Sub-GSE to be more sensitive than previous methods, such as GSEA, GSA, and SigPath, in detecting gene sets assiated with a phenotype of interest. This is particularly true for cases in which only a fraction of the genes in the gene set are associated with the phenotypes. Furthermore, the application of Sub-GSE to two real data sets demonstrates that it can detect more biologically meaningful gene sets than GSEA.

Conclusion: We developed a new method to measure the gene set enrichment. Applications to two simulated datasets and two real datasets show that this method is sensitive to the associations between gene sets and phenotype. The program Sub-GSE can be downloaded from http://www-rcf.usc.edu/~fsun.

Show MeSH
The distribution of p-values under the  hypothesis of no association. The histogram of the p-values under the  hypothesis of no association between the gene sets and the phenotype.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2543030&req=5

Figure 2: The distribution of p-values under the hypothesis of no association. The histogram of the p-values under the hypothesis of no association between the gene sets and the phenotype.

Mentions: Second, the phenotypic data is independent of the expression levels of all the genes. Therefore, Sub-GSE should not detect any significant gene sets. In Figure 2, the histogram of all the p-values of the 100 gene sets from the 100 data sets is shown. The histogram illustrates that the p-values from the Sub-GSE have a uniform distribution for gene sets that are not related to the phenotype, which is consistent with the theoretical uniform distribution under the hypothesis.


Testing gene set enrichment for subset of genes: Sub-GSE.

Yan X, Sun F - BMC Bioinformatics (2008)

The distribution of p-values under the  hypothesis of no association. The histogram of the p-values under the  hypothesis of no association between the gene sets and the phenotype.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2543030&req=5

Figure 2: The distribution of p-values under the hypothesis of no association. The histogram of the p-values under the hypothesis of no association between the gene sets and the phenotype.
Mentions: Second, the phenotypic data is independent of the expression levels of all the genes. Therefore, Sub-GSE should not detect any significant gene sets. In Figure 2, the histogram of all the p-values of the 100 gene sets from the 100 data sets is shown. The histogram illustrates that the p-values from the Sub-GSE have a uniform distribution for gene sets that are not related to the phenotype, which is consistent with the theoretical uniform distribution under the hypothesis.

Bottom Line: The results based on gene set analysis are generally more biologically interpretable, accurate and robust than the results based on individual gene analysis.This is particularly true for cases in which only a fraction of the genes in the gene set are associated with the phenotypes.Furthermore, the application of Sub-GSE to two real data sets demonstrates that it can detect more biologically meaningful gene sets than GSEA.

View Article: PubMed Central - HTML - PubMed

Affiliation: Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089-2910, USA. xitingya@usc.edu

ABSTRACT

Background: Many methods have been developed to test the enrichment of genes related to certain phenotypes or cell states in gene sets. These approaches usually combine gene expression data with functionally related gene sets as defined in databases such as GeneOntology (GO), KEGG, or BioCarta. The results based on gene set analysis are generally more biologically interpretable, accurate and robust than the results based on individual gene analysis. However, while most available methods for gene set enrichment analysis test the enrichment of the entire gene set, it is more likely that only a subset of the genes in the gene set may be related to the phenotypes of interest.

Results: In this paper, we develop a novel method, termed Sub-GSE, which measures the enrichment of a predefined gene set, or pathway, by testing its subsets. The application of Sub-GSE to two simulated and two real datasets shows Sub-GSE to be more sensitive than previous methods, such as GSEA, GSA, and SigPath, in detecting gene sets assiated with a phenotype of interest. This is particularly true for cases in which only a fraction of the genes in the gene set are associated with the phenotypes. Furthermore, the application of Sub-GSE to two real data sets demonstrates that it can detect more biologically meaningful gene sets than GSEA.

Conclusion: We developed a new method to measure the gene set enrichment. Applications to two simulated datasets and two real datasets show that this method is sensitive to the associations between gene sets and phenotype. The program Sub-GSE can be downloaded from http://www-rcf.usc.edu/~fsun.

Show MeSH