Limits...
Combining transcriptional datasets using the generalized singular value decomposition.

Schreiber AW, Shirley NJ, Burton RA, Fincher GB - BMC Bioinformatics (2008)

Bottom Line: Because of platform-dependent systematic effects, simple comparisons or merging of datasets obtained by these technologies are difficult, even though they may often be desirable.The method enables us to discover putative candidate genes involved in the biosynthesis of the (1,3;1,4)-beta-D-glucan polysaccharide found in plant cell walls.We believe that it will prove to be particularly useful for exploiting large, publicly available, microarray datasets for species with unsequenced genomes by complementing them with more limited in-house expression measurements.

View Article: PubMed Central - HTML - PubMed

Affiliation: Australian Centre for Plant Functional Genomics, School of Agriculture and Wine, University of Adelaide, Waite Campus, Glen Osmond, SA 5064, Australia. andreas.schreiber@adelaide.edu.au

ABSTRACT

Background: Both microarrays and quantitative real-time PCR are convenient tools for studying the transcriptional levels of genes. The former is preferable for large scale studies while the latter is a more targeted technique. Because of platform-dependent systematic effects, simple comparisons or merging of datasets obtained by these technologies are difficult, even though they may often be desirable. These difficulties are exacerbated if there is only partial overlap between the experimental conditions and genes probed in the two datasets.

Results: We show here that the generalized singular value decomposition provides a practical tool for merging a small, targeted dataset obtained by quantitative real-time PCR of specific genes with a much larger microarray dataset. The technique permits, for the first time, the identification of genes present in only one dataset co-expressed with a target gene present exclusively in the other dataset, even when experimental conditions for the two datasets are not identical. With the rapidly increasing number of publically available large scale microarray datasets the latter is frequently the case. The method enables us to discover putative candidate genes involved in the biosynthesis of the (1,3;1,4)-beta-D-glucan polysaccharide found in plant cell walls.

Conclusion: We show that the generalized singular value decomposition provides a viable tool for a combined analysis of two gene expression datasets with only partial overlap of both gene sets and experimental conditions. We illustrate how the decomposition can be optimized self-consistently by using a judicious choice of genes to define it. The ability of the technique to seamlessly define a concept of "co-expression" across both datasets provides an avenue for meaningful data integration. We believe that it will prove to be particularly useful for exploiting large, publicly available, microarray datasets for species with unsequenced genomes by complementing them with more limited in-house expression measurements.

Show MeSH
Confirmation of co-expression of HvCslF3 and candidate genes. In panel A the Q-PCR expression profiles of the cellulose synthase-like gene HvCslF3 and the candidate genes identified in this study are compared. Expression profiles have been standardized as described in the text. As can be seen, the genes indeed co-express in the tissues probed in both the Q-PCR and microarray datasets. Panel B shows an additional comparison of Q-PCR coleoptile time course expression profiles of HvCslF3 and Contig11619. The two genes appear to remain roughly co-expressed in this time-course as well.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2562393&req=5

Figure 6: Confirmation of co-expression of HvCslF3 and candidate genes. In panel A the Q-PCR expression profiles of the cellulose synthase-like gene HvCslF3 and the candidate genes identified in this study are compared. Expression profiles have been standardized as described in the text. As can be seen, the genes indeed co-express in the tissues probed in both the Q-PCR and microarray datasets. Panel B shows an additional comparison of Q-PCR coleoptile time course expression profiles of HvCslF3 and Contig11619. The two genes appear to remain roughly co-expressed in this time-course as well.

Mentions: In order to confirm the apparent co-regulation of HvCslF3 with this selection of genes probed by the microarray, primers were constructed so that their transcript abundance in the 11 barley tissues of the Q-PCR dataset could be checked directly using Q-PCR. The resulting expression profile of the most consistently co-expressed candidate (correlation coefficient 0.72), the putative ceramide glucosyltransferase Contig11619, is shown in red in Figure 6A alongside the corresponding expression profile of HvCslF3 (black), confirming that the GSVD procedure has indeed correctly identified a hitherto unknown co-expressed gene to this cellulose synthase-like gene. Similar cross-checks were carried out for Contig14830 (corr. coeff. 0.29), Contig16931 (0.68), Contig15434 (0.75) and Contig18825 (0.71), the latter two being already present in the Q-PCR dataset (i.e. region A). The expression profiles for these genes are also shown in Fig. 6A. As can be seen, all but Contig14830 show significant co-expression with HvCslF3 in the tissues probed by the Q-PCR dataset.


Combining transcriptional datasets using the generalized singular value decomposition.

Schreiber AW, Shirley NJ, Burton RA, Fincher GB - BMC Bioinformatics (2008)

Confirmation of co-expression of HvCslF3 and candidate genes. In panel A the Q-PCR expression profiles of the cellulose synthase-like gene HvCslF3 and the candidate genes identified in this study are compared. Expression profiles have been standardized as described in the text. As can be seen, the genes indeed co-express in the tissues probed in both the Q-PCR and microarray datasets. Panel B shows an additional comparison of Q-PCR coleoptile time course expression profiles of HvCslF3 and Contig11619. The two genes appear to remain roughly co-expressed in this time-course as well.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2562393&req=5

Figure 6: Confirmation of co-expression of HvCslF3 and candidate genes. In panel A the Q-PCR expression profiles of the cellulose synthase-like gene HvCslF3 and the candidate genes identified in this study are compared. Expression profiles have been standardized as described in the text. As can be seen, the genes indeed co-express in the tissues probed in both the Q-PCR and microarray datasets. Panel B shows an additional comparison of Q-PCR coleoptile time course expression profiles of HvCslF3 and Contig11619. The two genes appear to remain roughly co-expressed in this time-course as well.
Mentions: In order to confirm the apparent co-regulation of HvCslF3 with this selection of genes probed by the microarray, primers were constructed so that their transcript abundance in the 11 barley tissues of the Q-PCR dataset could be checked directly using Q-PCR. The resulting expression profile of the most consistently co-expressed candidate (correlation coefficient 0.72), the putative ceramide glucosyltransferase Contig11619, is shown in red in Figure 6A alongside the corresponding expression profile of HvCslF3 (black), confirming that the GSVD procedure has indeed correctly identified a hitherto unknown co-expressed gene to this cellulose synthase-like gene. Similar cross-checks were carried out for Contig14830 (corr. coeff. 0.29), Contig16931 (0.68), Contig15434 (0.75) and Contig18825 (0.71), the latter two being already present in the Q-PCR dataset (i.e. region A). The expression profiles for these genes are also shown in Fig. 6A. As can be seen, all but Contig14830 show significant co-expression with HvCslF3 in the tissues probed by the Q-PCR dataset.

Bottom Line: Because of platform-dependent systematic effects, simple comparisons or merging of datasets obtained by these technologies are difficult, even though they may often be desirable.The method enables us to discover putative candidate genes involved in the biosynthesis of the (1,3;1,4)-beta-D-glucan polysaccharide found in plant cell walls.We believe that it will prove to be particularly useful for exploiting large, publicly available, microarray datasets for species with unsequenced genomes by complementing them with more limited in-house expression measurements.

View Article: PubMed Central - HTML - PubMed

Affiliation: Australian Centre for Plant Functional Genomics, School of Agriculture and Wine, University of Adelaide, Waite Campus, Glen Osmond, SA 5064, Australia. andreas.schreiber@adelaide.edu.au

ABSTRACT

Background: Both microarrays and quantitative real-time PCR are convenient tools for studying the transcriptional levels of genes. The former is preferable for large scale studies while the latter is a more targeted technique. Because of platform-dependent systematic effects, simple comparisons or merging of datasets obtained by these technologies are difficult, even though they may often be desirable. These difficulties are exacerbated if there is only partial overlap between the experimental conditions and genes probed in the two datasets.

Results: We show here that the generalized singular value decomposition provides a practical tool for merging a small, targeted dataset obtained by quantitative real-time PCR of specific genes with a much larger microarray dataset. The technique permits, for the first time, the identification of genes present in only one dataset co-expressed with a target gene present exclusively in the other dataset, even when experimental conditions for the two datasets are not identical. With the rapidly increasing number of publically available large scale microarray datasets the latter is frequently the case. The method enables us to discover putative candidate genes involved in the biosynthesis of the (1,3;1,4)-beta-D-glucan polysaccharide found in plant cell walls.

Conclusion: We show that the generalized singular value decomposition provides a viable tool for a combined analysis of two gene expression datasets with only partial overlap of both gene sets and experimental conditions. We illustrate how the decomposition can be optimized self-consistently by using a judicious choice of genes to define it. The ability of the technique to seamlessly define a concept of "co-expression" across both datasets provides an avenue for meaningful data integration. We believe that it will prove to be particularly useful for exploiting large, publicly available, microarray datasets for species with unsequenced genomes by complementing them with more limited in-house expression measurements.

Show MeSH