Limits...
Combining transcriptional datasets using the generalized singular value decomposition.

Schreiber AW, Shirley NJ, Burton RA, Fincher GB - BMC Bioinformatics (2008)

Bottom Line: Because of platform-dependent systematic effects, simple comparisons or merging of datasets obtained by these technologies are difficult, even though they may often be desirable.The method enables us to discover putative candidate genes involved in the biosynthesis of the (1,3;1,4)-beta-D-glucan polysaccharide found in plant cell walls.We believe that it will prove to be particularly useful for exploiting large, publicly available, microarray datasets for species with unsequenced genomes by complementing them with more limited in-house expression measurements.

View Article: PubMed Central - HTML - PubMed

Affiliation: Australian Centre for Plant Functional Genomics, School of Agriculture and Wine, University of Adelaide, Waite Campus, Glen Osmond, SA 5064, Australia. andreas.schreiber@adelaide.edu.au

ABSTRACT

Background: Both microarrays and quantitative real-time PCR are convenient tools for studying the transcriptional levels of genes. The former is preferable for large scale studies while the latter is a more targeted technique. Because of platform-dependent systematic effects, simple comparisons or merging of datasets obtained by these technologies are difficult, even though they may often be desirable. These difficulties are exacerbated if there is only partial overlap between the experimental conditions and genes probed in the two datasets.

Results: We show here that the generalized singular value decomposition provides a practical tool for merging a small, targeted dataset obtained by quantitative real-time PCR of specific genes with a much larger microarray dataset. The technique permits, for the first time, the identification of genes present in only one dataset co-expressed with a target gene present exclusively in the other dataset, even when experimental conditions for the two datasets are not identical. With the rapidly increasing number of publically available large scale microarray datasets the latter is frequently the case. The method enables us to discover putative candidate genes involved in the biosynthesis of the (1,3;1,4)-beta-D-glucan polysaccharide found in plant cell walls.

Conclusion: We show that the generalized singular value decomposition provides a viable tool for a combined analysis of two gene expression datasets with only partial overlap of both gene sets and experimental conditions. We illustrate how the decomposition can be optimized self-consistently by using a judicious choice of genes to define it. The ability of the technique to seamlessly define a concept of "co-expression" across both datasets provides an avenue for meaningful data integration. We believe that it will prove to be particularly useful for exploiting large, publicly available, microarray datasets for species with unsequenced genomes by complementing them with more limited in-house expression measurements.

Show MeSH

Related in: MedlinePlus

Using the GSVD to identify candidate co-expressing genes. This schematic flowchart shows the procedures used to identify a) an overlapping region between the two datasets as well as b) candidate genes probed by the microarray co-expressing with genes of interest from the Q-PCR dataset. Regions A, B and C refer to those defined in Fig. 1. In order to reduce the number of false positives we have repeated the entire procedure a number of times and only examine in detail genes that co-express consistently among these repeats.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2562393&req=5

Figure 2: Using the GSVD to identify candidate co-expressing genes. This schematic flowchart shows the procedures used to identify a) an overlapping region between the two datasets as well as b) candidate genes probed by the microarray co-expressing with genes of interest from the Q-PCR dataset. Regions A, B and C refer to those defined in Fig. 1. In order to reduce the number of false positives we have repeated the entire procedure a number of times and only examine in detail genes that co-express consistently among these repeats.

Mentions: These features suggest an iterative approach, illustrated in Fig. 2, for using the GSVD in a search for co-expressed genes across the two datasets. This approach is described in detail in the following sections.


Combining transcriptional datasets using the generalized singular value decomposition.

Schreiber AW, Shirley NJ, Burton RA, Fincher GB - BMC Bioinformatics (2008)

Using the GSVD to identify candidate co-expressing genes. This schematic flowchart shows the procedures used to identify a) an overlapping region between the two datasets as well as b) candidate genes probed by the microarray co-expressing with genes of interest from the Q-PCR dataset. Regions A, B and C refer to those defined in Fig. 1. In order to reduce the number of false positives we have repeated the entire procedure a number of times and only examine in detail genes that co-express consistently among these repeats.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2562393&req=5

Figure 2: Using the GSVD to identify candidate co-expressing genes. This schematic flowchart shows the procedures used to identify a) an overlapping region between the two datasets as well as b) candidate genes probed by the microarray co-expressing with genes of interest from the Q-PCR dataset. Regions A, B and C refer to those defined in Fig. 1. In order to reduce the number of false positives we have repeated the entire procedure a number of times and only examine in detail genes that co-express consistently among these repeats.
Mentions: These features suggest an iterative approach, illustrated in Fig. 2, for using the GSVD in a search for co-expressed genes across the two datasets. This approach is described in detail in the following sections.

Bottom Line: Because of platform-dependent systematic effects, simple comparisons or merging of datasets obtained by these technologies are difficult, even though they may often be desirable.The method enables us to discover putative candidate genes involved in the biosynthesis of the (1,3;1,4)-beta-D-glucan polysaccharide found in plant cell walls.We believe that it will prove to be particularly useful for exploiting large, publicly available, microarray datasets for species with unsequenced genomes by complementing them with more limited in-house expression measurements.

View Article: PubMed Central - HTML - PubMed

Affiliation: Australian Centre for Plant Functional Genomics, School of Agriculture and Wine, University of Adelaide, Waite Campus, Glen Osmond, SA 5064, Australia. andreas.schreiber@adelaide.edu.au

ABSTRACT

Background: Both microarrays and quantitative real-time PCR are convenient tools for studying the transcriptional levels of genes. The former is preferable for large scale studies while the latter is a more targeted technique. Because of platform-dependent systematic effects, simple comparisons or merging of datasets obtained by these technologies are difficult, even though they may often be desirable. These difficulties are exacerbated if there is only partial overlap between the experimental conditions and genes probed in the two datasets.

Results: We show here that the generalized singular value decomposition provides a practical tool for merging a small, targeted dataset obtained by quantitative real-time PCR of specific genes with a much larger microarray dataset. The technique permits, for the first time, the identification of genes present in only one dataset co-expressed with a target gene present exclusively in the other dataset, even when experimental conditions for the two datasets are not identical. With the rapidly increasing number of publically available large scale microarray datasets the latter is frequently the case. The method enables us to discover putative candidate genes involved in the biosynthesis of the (1,3;1,4)-beta-D-glucan polysaccharide found in plant cell walls.

Conclusion: We show that the generalized singular value decomposition provides a viable tool for a combined analysis of two gene expression datasets with only partial overlap of both gene sets and experimental conditions. We illustrate how the decomposition can be optimized self-consistently by using a judicious choice of genes to define it. The ability of the technique to seamlessly define a concept of "co-expression" across both datasets provides an avenue for meaningful data integration. We believe that it will prove to be particularly useful for exploiting large, publicly available, microarray datasets for species with unsequenced genomes by complementing them with more limited in-house expression measurements.

Show MeSH
Related in: MedlinePlus