Limits...
Identifying functional gene sets from hierarchically clustered expression data: map of abiotic stress regulated genes in Arabidopsis thaliana.

Kankainen M, Brader G, Törönen P, Palva ET, Holm L - Nucleic Acids Res. (2006)

Bottom Line: The tool also identifies a plausible cluster set, which represents the key biological functions affected by the experiment.The analysis not only identified known biological functions, but also brought into focus the less established connections to defense-related gene clusters.Thus, in comparison to analyses of manually selected gene lists, the systematic analysis of every cluster can reveal unexpected biological phenomena and produce much more comprehensive biological insights to the experiment of interest.

View Article: PubMed Central - PubMed

Affiliation: Institute of Biotechnology, PO Box 56 (Viikinkaari 5), FIN-00014, Helsinki, Finland.

ABSTRACT
We present MultiGO, a web-enabled tool for the identification of biologically relevant gene sets from hierarchically clustered gene expression trees (http://ekhidna.biocenter.helsinki.fi/poxo/multigo). High-throughput gene expression measuring techniques, such as microarrays, are nowadays often used to monitor the expression of thousands of genes. Since these experiments can produce overwhelming amounts of data, computational methods that assist the data analysis and interpretation are essential. MultiGO is a tool that automatically extracts the biological information for multiple clusters and determines their biological relevance, and hence facilitates the interpretation of the data. Since the entire expression tree is analysed, MultiGO is guaranteed to report all clusters that share a common enriched biological function, as defined by Gene Ontology annotations. The tool also identifies a plausible cluster set, which represents the key biological functions affected by the experiment. The performance is demonstrated by analysing drought-, cold- and abscisic acid-related expression data sets from Arabidopsis thaliana. The analysis not only identified known biological functions, but also brought into focus the less established connections to defense-related gene clusters. Thus, in comparison to analyses of manually selected gene lists, the systematic analysis of every cluster can reveal unexpected biological phenomena and produce much more comprehensive biological insights to the experiment of interest.

Show MeSH

Related in: MedlinePlus

Reliability of the overall P-values estimated using random permutations. The most significant overall P-values were chosen from the randomised data for each position at the tree. The light grey area indicates those positions of the tree where random analyses yielded less significant overall P-values and the dark grey is an area where all random analyse yielded equal or more significant overall P-values. The left y-axis is the −log10 value of the overall P-value and the x-axis is the position at the tree.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC1636450&req=5

fig3: Reliability of the overall P-values estimated using random permutations. The most significant overall P-values were chosen from the randomised data for each position at the tree. The light grey area indicates those positions of the tree where random analyses yielded less significant overall P-values and the dark grey is an area where all random analyse yielded equal or more significant overall P-values. The left y-axis is the −log10 value of the overall P-value and the x-axis is the position at the tree.

Mentions: A set of interesting candidate clusters was obtained from the expression tree by cutting it at the location of the most significant overall P-value. The overall P-values in relation to their position in the tree are shown in Figure 3. The most significant overall P-value, obtained using Fisher's combined probability test, selects a set of clusters that would become merged together into biologically meaningless clusters, if one moves nearer to the root, whereas clusters with similar function would become split into several small clusters, if one moves farther from the root. The most significant overall P-value (3−70) is located at the height of 0.73 (Figure 3), where there are 42 clusters in total in the tree of which 14 have a significant GO-term associated with them (Table 1). Expression profiles of these 14 significant clusters are shown as heatmaps in Supplementary Figures 8–21. The 14 significant clusters contained genes that are involved in different biological functions that are likely to be the key functions affected by the experiment. For example, the experiments included ABA and environmental abiotic treatments that are listed in Table 1 as response to ABA (Node_11808) and as response to heat (Node_11805). Besides the affected abiotic stresses also clusters related to defence responses are listed, such as defense response (Node_11830) and response to wounding (Node_11791). Figure 3 indicates that there is a second potential cutting point in the expression tree located at the height of 0.61. At this position, the overall P-value is almost as significant (1e−65) as at the most significant position of the tree. Also the clusters that are located at this position are involved in functions that are mainly the same ones (9 out of the 14 functions reported at the 0.73 are reported here as well), including functions related to the probed experiments, such as response to ABA and response to heat. These findings indicate that a set of biologically meaningful clusters could indeed be caught by using overall P-values.


Identifying functional gene sets from hierarchically clustered expression data: map of abiotic stress regulated genes in Arabidopsis thaliana.

Kankainen M, Brader G, Törönen P, Palva ET, Holm L - Nucleic Acids Res. (2006)

Reliability of the overall P-values estimated using random permutations. The most significant overall P-values were chosen from the randomised data for each position at the tree. The light grey area indicates those positions of the tree where random analyses yielded less significant overall P-values and the dark grey is an area where all random analyse yielded equal or more significant overall P-values. The left y-axis is the −log10 value of the overall P-value and the x-axis is the position at the tree.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC1636450&req=5

fig3: Reliability of the overall P-values estimated using random permutations. The most significant overall P-values were chosen from the randomised data for each position at the tree. The light grey area indicates those positions of the tree where random analyses yielded less significant overall P-values and the dark grey is an area where all random analyse yielded equal or more significant overall P-values. The left y-axis is the −log10 value of the overall P-value and the x-axis is the position at the tree.
Mentions: A set of interesting candidate clusters was obtained from the expression tree by cutting it at the location of the most significant overall P-value. The overall P-values in relation to their position in the tree are shown in Figure 3. The most significant overall P-value, obtained using Fisher's combined probability test, selects a set of clusters that would become merged together into biologically meaningless clusters, if one moves nearer to the root, whereas clusters with similar function would become split into several small clusters, if one moves farther from the root. The most significant overall P-value (3−70) is located at the height of 0.73 (Figure 3), where there are 42 clusters in total in the tree of which 14 have a significant GO-term associated with them (Table 1). Expression profiles of these 14 significant clusters are shown as heatmaps in Supplementary Figures 8–21. The 14 significant clusters contained genes that are involved in different biological functions that are likely to be the key functions affected by the experiment. For example, the experiments included ABA and environmental abiotic treatments that are listed in Table 1 as response to ABA (Node_11808) and as response to heat (Node_11805). Besides the affected abiotic stresses also clusters related to defence responses are listed, such as defense response (Node_11830) and response to wounding (Node_11791). Figure 3 indicates that there is a second potential cutting point in the expression tree located at the height of 0.61. At this position, the overall P-value is almost as significant (1e−65) as at the most significant position of the tree. Also the clusters that are located at this position are involved in functions that are mainly the same ones (9 out of the 14 functions reported at the 0.73 are reported here as well), including functions related to the probed experiments, such as response to ABA and response to heat. These findings indicate that a set of biologically meaningful clusters could indeed be caught by using overall P-values.

Bottom Line: The tool also identifies a plausible cluster set, which represents the key biological functions affected by the experiment.The analysis not only identified known biological functions, but also brought into focus the less established connections to defense-related gene clusters.Thus, in comparison to analyses of manually selected gene lists, the systematic analysis of every cluster can reveal unexpected biological phenomena and produce much more comprehensive biological insights to the experiment of interest.

View Article: PubMed Central - PubMed

Affiliation: Institute of Biotechnology, PO Box 56 (Viikinkaari 5), FIN-00014, Helsinki, Finland.

ABSTRACT
We present MultiGO, a web-enabled tool for the identification of biologically relevant gene sets from hierarchically clustered gene expression trees (http://ekhidna.biocenter.helsinki.fi/poxo/multigo). High-throughput gene expression measuring techniques, such as microarrays, are nowadays often used to monitor the expression of thousands of genes. Since these experiments can produce overwhelming amounts of data, computational methods that assist the data analysis and interpretation are essential. MultiGO is a tool that automatically extracts the biological information for multiple clusters and determines their biological relevance, and hence facilitates the interpretation of the data. Since the entire expression tree is analysed, MultiGO is guaranteed to report all clusters that share a common enriched biological function, as defined by Gene Ontology annotations. The tool also identifies a plausible cluster set, which represents the key biological functions affected by the experiment. The performance is demonstrated by analysing drought-, cold- and abscisic acid-related expression data sets from Arabidopsis thaliana. The analysis not only identified known biological functions, but also brought into focus the less established connections to defense-related gene clusters. Thus, in comparison to analyses of manually selected gene lists, the systematic analysis of every cluster can reveal unexpected biological phenomena and produce much more comprehensive biological insights to the experiment of interest.

Show MeSH
Related in: MedlinePlus