Limits...
Identifying functional gene sets from hierarchically clustered expression data: map of abiotic stress regulated genes in Arabidopsis thaliana.

Kankainen M, Brader G, Törönen P, Palva ET, Holm L - Nucleic Acids Res. (2006)

Bottom Line: The tool also identifies a plausible cluster set, which represents the key biological functions affected by the experiment.The analysis not only identified known biological functions, but also brought into focus the less established connections to defense-related gene clusters.Thus, in comparison to analyses of manually selected gene lists, the systematic analysis of every cluster can reveal unexpected biological phenomena and produce much more comprehensive biological insights to the experiment of interest.

View Article: PubMed Central - PubMed

Affiliation: Institute of Biotechnology, PO Box 56 (Viikinkaari 5), FIN-00014, Helsinki, Finland.

ABSTRACT
We present MultiGO, a web-enabled tool for the identification of biologically relevant gene sets from hierarchically clustered gene expression trees (http://ekhidna.biocenter.helsinki.fi/poxo/multigo). High-throughput gene expression measuring techniques, such as microarrays, are nowadays often used to monitor the expression of thousands of genes. Since these experiments can produce overwhelming amounts of data, computational methods that assist the data analysis and interpretation are essential. MultiGO is a tool that automatically extracts the biological information for multiple clusters and determines their biological relevance, and hence facilitates the interpretation of the data. Since the entire expression tree is analysed, MultiGO is guaranteed to report all clusters that share a common enriched biological function, as defined by Gene Ontology annotations. The tool also identifies a plausible cluster set, which represents the key biological functions affected by the experiment. The performance is demonstrated by analysing drought-, cold- and abscisic acid-related expression data sets from Arabidopsis thaliana. The analysis not only identified known biological functions, but also brought into focus the less established connections to defense-related gene clusters. Thus, in comparison to analyses of manually selected gene lists, the systematic analysis of every cluster can reveal unexpected biological phenomena and produce much more comprehensive biological insights to the experiment of interest.

Show MeSH

Related in: MedlinePlus

The behaviour of the overall P-values in expression trees created using different linkage methods and distance metrics. In the figure a, c and s are average, complete and single linkage and e and p are Euclidean and Pearson coefficient correlation distances. The y-axis is the −log10 value of the overall P-value and the x-axis is the position at the tree.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC1636450&req=5

fig2: The behaviour of the overall P-values in expression trees created using different linkage methods and distance metrics. In the figure a, c and s are average, complete and single linkage and e and p are Euclidean and Pearson coefficient correlation distances. The y-axis is the −log10 value of the overall P-value and the x-axis is the position at the tree.

Mentions: The effect of different linkage methods and distance metrics on the overall P-values, and on the expression tree cutting point selection, was investigated by clustering the expression data using different combinations of average, complete and single linkage, and of Euclidean and Pearson correlation coefficient distances. Parameters used to analyse these trees were selected based on the information of the parameter analyses and by using those statistical methods that are considered the most reliable. HD was used instead of BD, and FDR-correction was used with a rather conservative P-value cut-off (0.001), all genes were analysed and no filtering was performed. The performance of the different HC combinations is shown in Figure 2. For this expression data, each linkage method achieves its best result when Pearson coefficient correlation is used as the distance, agreeing well with the distance metric assumption made in the original HC study (4). From the different linkage methods, average linkage creates results with the most significant overall P-value. However, the difference between the complete and average linkage methods is almost indistinguishable, their most significant overall P-values were 4−68 and 3−70, indicating that these linkage methods perform equally well. This supports the previous study where it was shown that average and complete linkage methods outperform single linkage (6).


Identifying functional gene sets from hierarchically clustered expression data: map of abiotic stress regulated genes in Arabidopsis thaliana.

Kankainen M, Brader G, Törönen P, Palva ET, Holm L - Nucleic Acids Res. (2006)

The behaviour of the overall P-values in expression trees created using different linkage methods and distance metrics. In the figure a, c and s are average, complete and single linkage and e and p are Euclidean and Pearson coefficient correlation distances. The y-axis is the −log10 value of the overall P-value and the x-axis is the position at the tree.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC1636450&req=5

fig2: The behaviour of the overall P-values in expression trees created using different linkage methods and distance metrics. In the figure a, c and s are average, complete and single linkage and e and p are Euclidean and Pearson coefficient correlation distances. The y-axis is the −log10 value of the overall P-value and the x-axis is the position at the tree.
Mentions: The effect of different linkage methods and distance metrics on the overall P-values, and on the expression tree cutting point selection, was investigated by clustering the expression data using different combinations of average, complete and single linkage, and of Euclidean and Pearson correlation coefficient distances. Parameters used to analyse these trees were selected based on the information of the parameter analyses and by using those statistical methods that are considered the most reliable. HD was used instead of BD, and FDR-correction was used with a rather conservative P-value cut-off (0.001), all genes were analysed and no filtering was performed. The performance of the different HC combinations is shown in Figure 2. For this expression data, each linkage method achieves its best result when Pearson coefficient correlation is used as the distance, agreeing well with the distance metric assumption made in the original HC study (4). From the different linkage methods, average linkage creates results with the most significant overall P-value. However, the difference between the complete and average linkage methods is almost indistinguishable, their most significant overall P-values were 4−68 and 3−70, indicating that these linkage methods perform equally well. This supports the previous study where it was shown that average and complete linkage methods outperform single linkage (6).

Bottom Line: The tool also identifies a plausible cluster set, which represents the key biological functions affected by the experiment.The analysis not only identified known biological functions, but also brought into focus the less established connections to defense-related gene clusters.Thus, in comparison to analyses of manually selected gene lists, the systematic analysis of every cluster can reveal unexpected biological phenomena and produce much more comprehensive biological insights to the experiment of interest.

View Article: PubMed Central - PubMed

Affiliation: Institute of Biotechnology, PO Box 56 (Viikinkaari 5), FIN-00014, Helsinki, Finland.

ABSTRACT
We present MultiGO, a web-enabled tool for the identification of biologically relevant gene sets from hierarchically clustered gene expression trees (http://ekhidna.biocenter.helsinki.fi/poxo/multigo). High-throughput gene expression measuring techniques, such as microarrays, are nowadays often used to monitor the expression of thousands of genes. Since these experiments can produce overwhelming amounts of data, computational methods that assist the data analysis and interpretation are essential. MultiGO is a tool that automatically extracts the biological information for multiple clusters and determines their biological relevance, and hence facilitates the interpretation of the data. Since the entire expression tree is analysed, MultiGO is guaranteed to report all clusters that share a common enriched biological function, as defined by Gene Ontology annotations. The tool also identifies a plausible cluster set, which represents the key biological functions affected by the experiment. The performance is demonstrated by analysing drought-, cold- and abscisic acid-related expression data sets from Arabidopsis thaliana. The analysis not only identified known biological functions, but also brought into focus the less established connections to defense-related gene clusters. Thus, in comparison to analyses of manually selected gene lists, the systematic analysis of every cluster can reveal unexpected biological phenomena and produce much more comprehensive biological insights to the experiment of interest.

Show MeSH
Related in: MedlinePlus