Limits...
Grouping Gene Ontology terms to improve the assessment of gene set enrichment in microarray data.

Lewin A, Grieve IC - BMC Bioinformatics (2006)

Bottom Line: Gene Ontology (GO) terms are often used to assess the results of microarray experiments.Our method finds groups of GO terms significantly over-represented amongst differentially expressed genes which are not found by Fisher's tests on individual GO terms.Grouping Gene Ontology terms improves the interpretation of gene set enrichment for microarray data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Epidemiology and Public Health, Imperial College, Norfolk Place, London W2 1PG, UK. a.m.lewin@imperial.ac.uk

ABSTRACT

Background: Gene Ontology (GO) terms are often used to assess the results of microarray experiments. The most common way to do this is to perform Fisher's exact tests to find GO terms which are over-represented amongst the genes declared to be differentially expressed in the analysis of the microarray experiment. However, due to the high degree of dependence between GO terms, statistical testing is conservative, and interpretation is difficult.

Results: We propose testing groups of GO terms rather than individual terms, to increase statistical power, reduce dependence between tests and improve the interpretation of results. We use the publicly available package POSOC to group the terms. Our method finds groups of GO terms significantly over-represented amongst differentially expressed genes which are not found by Fisher's tests on individual GO terms.

Conclusion: Grouping Gene Ontology terms improves the interpretation of gene set enrichment for microarray data.

Show MeSH

Related in: MedlinePlus

Sizes of POSOC groups for the U74A chip. Number of genes versus number of GO nodes for the POSOC groups for the U74A chip. The solid circles mark the three groups found significant in the group analysis for the Cd36 data set.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1622761&req=5

Figure 2: Sizes of POSOC groups for the U74A chip. Number of genes versus number of GO nodes for the POSOC groups for the U74A chip. The solid circles mark the three groups found significant in the group analysis for the Cd36 data set.

Mentions: We now analyze this data using POSOC groups. This data set is from the U74A Affymetrix chip, so the POSOC groups we use are those found using all genes on the U74A chip (and the Biological Process branch of the GO). There are 258 groups. Note that the number of nodes in the Biological Process branch of the Gene Ontology is around 4100, so we have greatly reduced the space on which we perform the statistical tests. Table 1 shows the frequencies of group sizes. Figure 2 shows a scatter plot of the number of genes versus number of nodes. The distributions are highly skewed, with most groups being made up of fewer than 5 GO nodes, and having fewer than 50 genes annotated.


Grouping Gene Ontology terms to improve the assessment of gene set enrichment in microarray data.

Lewin A, Grieve IC - BMC Bioinformatics (2006)

Sizes of POSOC groups for the U74A chip. Number of genes versus number of GO nodes for the POSOC groups for the U74A chip. The solid circles mark the three groups found significant in the group analysis for the Cd36 data set.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1622761&req=5

Figure 2: Sizes of POSOC groups for the U74A chip. Number of genes versus number of GO nodes for the POSOC groups for the U74A chip. The solid circles mark the three groups found significant in the group analysis for the Cd36 data set.
Mentions: We now analyze this data using POSOC groups. This data set is from the U74A Affymetrix chip, so the POSOC groups we use are those found using all genes on the U74A chip (and the Biological Process branch of the GO). There are 258 groups. Note that the number of nodes in the Biological Process branch of the Gene Ontology is around 4100, so we have greatly reduced the space on which we perform the statistical tests. Table 1 shows the frequencies of group sizes. Figure 2 shows a scatter plot of the number of genes versus number of nodes. The distributions are highly skewed, with most groups being made up of fewer than 5 GO nodes, and having fewer than 50 genes annotated.

Bottom Line: Gene Ontology (GO) terms are often used to assess the results of microarray experiments.Our method finds groups of GO terms significantly over-represented amongst differentially expressed genes which are not found by Fisher's tests on individual GO terms.Grouping Gene Ontology terms improves the interpretation of gene set enrichment for microarray data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Epidemiology and Public Health, Imperial College, Norfolk Place, London W2 1PG, UK. a.m.lewin@imperial.ac.uk

ABSTRACT

Background: Gene Ontology (GO) terms are often used to assess the results of microarray experiments. The most common way to do this is to perform Fisher's exact tests to find GO terms which are over-represented amongst the genes declared to be differentially expressed in the analysis of the microarray experiment. However, due to the high degree of dependence between GO terms, statistical testing is conservative, and interpretation is difficult.

Results: We propose testing groups of GO terms rather than individual terms, to increase statistical power, reduce dependence between tests and improve the interpretation of results. We use the publicly available package POSOC to group the terms. Our method finds groups of GO terms significantly over-represented amongst differentially expressed genes which are not found by Fisher's tests on individual GO terms.

Conclusion: Grouping Gene Ontology terms improves the interpretation of gene set enrichment for microarray data.

Show MeSH
Related in: MedlinePlus