Limits...
Identifying functional gene sets from hierarchically clustered expression data: map of abiotic stress regulated genes in Arabidopsis thaliana.

Kankainen M, Brader G, Törönen P, Palva ET, Holm L - Nucleic Acids Res. (2006)

Bottom Line: The tool also identifies a plausible cluster set, which represents the key biological functions affected by the experiment.The analysis not only identified known biological functions, but also brought into focus the less established connections to defense-related gene clusters.Thus, in comparison to analyses of manually selected gene lists, the systematic analysis of every cluster can reveal unexpected biological phenomena and produce much more comprehensive biological insights to the experiment of interest.

View Article: PubMed Central - PubMed

Affiliation: Institute of Biotechnology, PO Box 56 (Viikinkaari 5), FIN-00014, Helsinki, Finland.

ABSTRACT
We present MultiGO, a web-enabled tool for the identification of biologically relevant gene sets from hierarchically clustered gene expression trees (http://ekhidna.biocenter.helsinki.fi/poxo/multigo). High-throughput gene expression measuring techniques, such as microarrays, are nowadays often used to monitor the expression of thousands of genes. Since these experiments can produce overwhelming amounts of data, computational methods that assist the data analysis and interpretation are essential. MultiGO is a tool that automatically extracts the biological information for multiple clusters and determines their biological relevance, and hence facilitates the interpretation of the data. Since the entire expression tree is analysed, MultiGO is guaranteed to report all clusters that share a common enriched biological function, as defined by Gene Ontology annotations. The tool also identifies a plausible cluster set, which represents the key biological functions affected by the experiment. The performance is demonstrated by analysing drought-, cold- and abscisic acid-related expression data sets from Arabidopsis thaliana. The analysis not only identified known biological functions, but also brought into focus the less established connections to defense-related gene clusters. Thus, in comparison to analyses of manually selected gene lists, the systematic analysis of every cluster can reveal unexpected biological phenomena and produce much more comprehensive biological insights to the experiment of interest.

Show MeSH

Related in: MedlinePlus

Fisher's combined probability test is used to calculate the overall P-values for the clusters and to estimate the cutting point of the expression tree. Light gray line illustrates the height that is calculated in the example. Light gray boxes show the selected clusters where the P-values used in the calculus are underlined (the P-value of the best GO-term of the selected cluster).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC1636450&req=5

fig1: Fisher's combined probability test is used to calculate the overall P-values for the clusters and to estimate the cutting point of the expression tree. Light gray line illustrates the height that is calculated in the example. Light gray boxes show the selected clusters where the P-values used in the calculus are underlined (the P-value of the best GO-term of the selected cluster).

Mentions: The most optimal cluster set, i.e. the best cutting level of the expression tree, and the corresponding biological key functions are searched in MultiGO by calculating the overall statistical significance of the GO-terms of the clusters located at a given level of similarity (Figure 1). This calculus is repeated at every level of similarity in the tree, starting from the single gene level and ending at the level of one cluster containing all the genes of the experiment. The overall statistical significance is calculated using Fisher's combined probability test (Equation 3, Figure 1) (17). In Equation 3, Pi is the corrected P-value of the most significant GO-term of the ith cluster and k is the number of clusters created at the position or passing it, i.e. clusters created farther from the root and bypassing the position. The overall P-value for the given clusters is then calculated from the test score using chi-square distribution with 2k degrees of freedom.3χF2=−2∑i=1kln[Pi].When calculating the significance, Pi is set to one for clusters that do not contain significant GO-terms and for clusters that have been filtered, i.e. clusters that contain an improper number of genes according to the corresponding parameters (see Supplementary Data for more detailed description of the parameters). In Fisher's combined probability test, P-values to be combined (Pi) are assumed to be independent (17). In MultiGO, independence is aspired using non-overlapping clusters and using the P-values of the best GO-terms. The use of non-overlapping clusters guarantee that the GO-terms combined in Equation 3 are not influenced by shared genes, whereas only choosing the GO-term with the most significant P-value avoids the correlations of GO-terms due to the structure of the DAG.


Identifying functional gene sets from hierarchically clustered expression data: map of abiotic stress regulated genes in Arabidopsis thaliana.

Kankainen M, Brader G, Törönen P, Palva ET, Holm L - Nucleic Acids Res. (2006)

Fisher's combined probability test is used to calculate the overall P-values for the clusters and to estimate the cutting point of the expression tree. Light gray line illustrates the height that is calculated in the example. Light gray boxes show the selected clusters where the P-values used in the calculus are underlined (the P-value of the best GO-term of the selected cluster).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC1636450&req=5

fig1: Fisher's combined probability test is used to calculate the overall P-values for the clusters and to estimate the cutting point of the expression tree. Light gray line illustrates the height that is calculated in the example. Light gray boxes show the selected clusters where the P-values used in the calculus are underlined (the P-value of the best GO-term of the selected cluster).
Mentions: The most optimal cluster set, i.e. the best cutting level of the expression tree, and the corresponding biological key functions are searched in MultiGO by calculating the overall statistical significance of the GO-terms of the clusters located at a given level of similarity (Figure 1). This calculus is repeated at every level of similarity in the tree, starting from the single gene level and ending at the level of one cluster containing all the genes of the experiment. The overall statistical significance is calculated using Fisher's combined probability test (Equation 3, Figure 1) (17). In Equation 3, Pi is the corrected P-value of the most significant GO-term of the ith cluster and k is the number of clusters created at the position or passing it, i.e. clusters created farther from the root and bypassing the position. The overall P-value for the given clusters is then calculated from the test score using chi-square distribution with 2k degrees of freedom.3χF2=−2∑i=1kln[Pi].When calculating the significance, Pi is set to one for clusters that do not contain significant GO-terms and for clusters that have been filtered, i.e. clusters that contain an improper number of genes according to the corresponding parameters (see Supplementary Data for more detailed description of the parameters). In Fisher's combined probability test, P-values to be combined (Pi) are assumed to be independent (17). In MultiGO, independence is aspired using non-overlapping clusters and using the P-values of the best GO-terms. The use of non-overlapping clusters guarantee that the GO-terms combined in Equation 3 are not influenced by shared genes, whereas only choosing the GO-term with the most significant P-value avoids the correlations of GO-terms due to the structure of the DAG.

Bottom Line: The tool also identifies a plausible cluster set, which represents the key biological functions affected by the experiment.The analysis not only identified known biological functions, but also brought into focus the less established connections to defense-related gene clusters.Thus, in comparison to analyses of manually selected gene lists, the systematic analysis of every cluster can reveal unexpected biological phenomena and produce much more comprehensive biological insights to the experiment of interest.

View Article: PubMed Central - PubMed

Affiliation: Institute of Biotechnology, PO Box 56 (Viikinkaari 5), FIN-00014, Helsinki, Finland.

ABSTRACT
We present MultiGO, a web-enabled tool for the identification of biologically relevant gene sets from hierarchically clustered gene expression trees (http://ekhidna.biocenter.helsinki.fi/poxo/multigo). High-throughput gene expression measuring techniques, such as microarrays, are nowadays often used to monitor the expression of thousands of genes. Since these experiments can produce overwhelming amounts of data, computational methods that assist the data analysis and interpretation are essential. MultiGO is a tool that automatically extracts the biological information for multiple clusters and determines their biological relevance, and hence facilitates the interpretation of the data. Since the entire expression tree is analysed, MultiGO is guaranteed to report all clusters that share a common enriched biological function, as defined by Gene Ontology annotations. The tool also identifies a plausible cluster set, which represents the key biological functions affected by the experiment. The performance is demonstrated by analysing drought-, cold- and abscisic acid-related expression data sets from Arabidopsis thaliana. The analysis not only identified known biological functions, but also brought into focus the less established connections to defense-related gene clusters. Thus, in comparison to analyses of manually selected gene lists, the systematic analysis of every cluster can reveal unexpected biological phenomena and produce much more comprehensive biological insights to the experiment of interest.

Show MeSH
Related in: MedlinePlus