Identifying functional gene sets from hierarchically clustered expression data: map of abiotic stress regulated genes in Arabidopsis thaliana.
Bottom Line:
The tool also identifies a plausible cluster set, which represents the key biological functions affected by the experiment.The analysis not only identified known biological functions, but also brought into focus the less established connections to defense-related gene clusters.Thus, in comparison to analyses of manually selected gene lists, the systematic analysis of every cluster can reveal unexpected biological phenomena and produce much more comprehensive biological insights to the experiment of interest.
View Article:
PubMed Central - PubMed
Affiliation: Institute of Biotechnology, PO Box 56 (Viikinkaari 5), FIN-00014, Helsinki, Finland.
ABSTRACT
Show MeSH
We present MultiGO, a web-enabled tool for the identification of biologically relevant gene sets from hierarchically clustered gene expression trees (http://ekhidna.biocenter.helsinki.fi/poxo/multigo). High-throughput gene expression measuring techniques, such as microarrays, are nowadays often used to monitor the expression of thousands of genes. Since these experiments can produce overwhelming amounts of data, computational methods that assist the data analysis and interpretation are essential. MultiGO is a tool that automatically extracts the biological information for multiple clusters and determines their biological relevance, and hence facilitates the interpretation of the data. Since the entire expression tree is analysed, MultiGO is guaranteed to report all clusters that share a common enriched biological function, as defined by Gene Ontology annotations. The tool also identifies a plausible cluster set, which represents the key biological functions affected by the experiment. The performance is demonstrated by analysing drought-, cold- and abscisic acid-related expression data sets from Arabidopsis thaliana. The analysis not only identified known biological functions, but also brought into focus the less established connections to defense-related gene clusters. Thus, in comparison to analyses of manually selected gene lists, the systematic analysis of every cluster can reveal unexpected biological phenomena and produce much more comprehensive biological insights to the experiment of interest. Related in: MedlinePlus |
Related In:
Results -
Collection
getmorefigures.php?uid=PMC1636450&req=5
Mentions: The most optimal cluster set, i.e. the best cutting level of the expression tree, and the corresponding biological key functions are searched in MultiGO by calculating the overall statistical significance of the GO-terms of the clusters located at a given level of similarity (Figure 1). This calculus is repeated at every level of similarity in the tree, starting from the single gene level and ending at the level of one cluster containing all the genes of the experiment. The overall statistical significance is calculated using Fisher's combined probability test (Equation 3, Figure 1) (17). In Equation 3, Pi is the corrected P-value of the most significant GO-term of the ith cluster and k is the number of clusters created at the position or passing it, i.e. clusters created farther from the root and bypassing the position. The overall P-value for the given clusters is then calculated from the test score using chi-square distribution with 2k degrees of freedom.3χF2=−2∑i=1kln[Pi].When calculating the significance, Pi is set to one for clusters that do not contain significant GO-terms and for clusters that have been filtered, i.e. clusters that contain an improper number of genes according to the corresponding parameters (see Supplementary Data for more detailed description of the parameters). In Fisher's combined probability test, P-values to be combined (Pi) are assumed to be independent (17). In MultiGO, independence is aspired using non-overlapping clusters and using the P-values of the best GO-terms. The use of non-overlapping clusters guarantee that the GO-terms combined in Equation 3 are not influenced by shared genes, whereas only choosing the GO-term with the most significant P-value avoids the correlations of GO-terms due to the structure of the DAG. |
View Article: PubMed Central - PubMed
Affiliation: Institute of Biotechnology, PO Box 56 (Viikinkaari 5), FIN-00014, Helsinki, Finland.