Limits...
Simultaneous analysis of distinct Omics data sets with integration of biological knowledge: Multiple Factor Analysis approach.

de Tayrac M, Lê S, Aubry M, Mosser J, Husson F - BMC Genomics (2009)

Bottom Line: Genomic analysis will greatly benefit from considering in a global way various sources of molecular data with the related biological knowledge.When applied to genomic and transcriptomic data and associated Gene Ontology annotations, our method prioritize the biological processes linked to the experimental settings.Furthermore, it reduces the time and effort to analyze large amounts of 'Omics' data.

View Article: PubMed Central - HTML - PubMed

Affiliation: CNRS UMR 6061, Université de Rennes 1, IFR 140, Faculté de Médecine, CS 34317, 35043 Rennes, France. marie.de-tayrac@univ-rennes1.fr

ABSTRACT

Background: Genomic analysis will greatly benefit from considering in a global way various sources of molecular data with the related biological knowledge. It is thus of great importance to provide useful integrative approaches dedicated to ease the interpretation of microarray data.

Results: Here, we introduce a data-mining approach, Multiple Factor Analysis (MFA), to combine multiple data sets and to add formalized knowledge. MFA is used to jointly analyse the structure emerging from genomic and transcriptomic data sets. The common structures are underlined and graphical outputs are provided such that biological meaning becomes easily retrievable. Gene Ontology terms are used to build gene modules that are superimposed on the experimentally interpreted plots. Functional interpretations are then supported by a step-by-step sequence of graphical representations.

Conclusion: When applied to genomic and transcriptomic data and associated Gene Ontology annotations, our method prioritize the biological processes linked to the experimental settings. Furthermore, it reduces the time and effort to analyze large amounts of 'Omics' data.

Show MeSH

Related in: MedlinePlus

Multi-way glioma data set: Characteristics of glioblastoma are linked to CGH and expression data whereas characteristics of oligodendrogliomas are mostly related to CGH data. The partial representation of the mean individuals (CGH and eX) for each WHO tumor type (A) and the group representation (B) are displayed. (A) The balanced representation of each category is located in the exact barycenter of the points summarizing partial points of view (CGH; linked by plain line and eX; linked by dot line). The projection of the partial representations for each category (CGH and eX for oligodendrogliomas, astrocytomas, oligoastrocytomas and glioblastomas) onto PC1 are very close; the partition of the tumors into WHO classification is thus shared by the genome and the transcriptome. On PC2, all the mean individuals from the partial expression representation (eX) are located around the origin. It is not the case for the genomic one (CGH). PC2 is therefore specific to the genomic point of view and is not shared by the expressional one. This is confirmed by analyzing the group representation (B): projection of the CGH and eX groups are closed along PC1 but only the one of CGH have a value close to 1 on PC2.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2636827&req=5

Figure 3: Multi-way glioma data set: Characteristics of glioblastoma are linked to CGH and expression data whereas characteristics of oligodendrogliomas are mostly related to CGH data. The partial representation of the mean individuals (CGH and eX) for each WHO tumor type (A) and the group representation (B) are displayed. (A) The balanced representation of each category is located in the exact barycenter of the points summarizing partial points of view (CGH; linked by plain line and eX; linked by dot line). The projection of the partial representations for each category (CGH and eX for oligodendrogliomas, astrocytomas, oligoastrocytomas and glioblastomas) onto PC1 are very close; the partition of the tumors into WHO classification is thus shared by the genome and the transcriptome. On PC2, all the mean individuals from the partial expression representation (eX) are located around the origin. It is not the case for the genomic one (CGH). PC2 is therefore specific to the genomic point of view and is not shared by the expressional one. This is confirmed by analyzing the group representation (B): projection of the CGH and eX groups are closed along PC1 but only the one of CGH have a value close to 1 on PC2.

Mentions: MFA is applied to the paired CGH array and microarray glioma data of Bredel et al. [19,20]. The resulting sample plots (33.7% of the total variability) are presented Figure 2 and Figure 3. The mean representation of the samples according to both CGH and gene expression data sets is presented Figure 2. Mean samples are represented by points colored following WHO classification of the tumors. Figure 3A shows the partial representation associated to each type of tumors (WHO classification: O, oligodendrogliomas; A, astrocytomas; OA, mixed oligo-astrocytomas and GBM, glioblastomas). This representation is obtained from the consensus between the CGH and expression (eX) points of view (i.e. genome and transcriptome variations). Each type of tumors is represented by three points: the consensus between the two points of view and a point for each point of view. Both scatter plots show a well-defined partition of the samples into WHO classification. This is particularly true along PC1 that underlines a partition of the samples into glioblastomas (GBM) and lower grade gliomas (O, A, OA). Partial representation (Figure 3A) and groups representation (Figure 3B) show that this partition exists (i) on PC1 at the genome and at the transcriptome levels and (ii) only at the genome level on PC2. Indeed, the projections on PC1 of the partial points for each category of tumors (CGH and eX for O, A, OA and GBM) are each time very close, meaning that CGH and eX define similar structures upon tumors on PC1. In a same manner, the projections of groups CGH and eX on PC1 have coordinates close to 1. On PC2, all the mean individuals from the partial expression representation (eX) are located around the origin, which is not the case for the genomic one (CGH); meaning that PC2 is specific to the genomic point of view. In the same manner, only projection of group CGH on PC2 has coordinates close to 1 (Figure 3B). Regarding CGH data, PC2 provides a partition of the histological subtypes and particularly stresses differences between oligodendrogliomas (O) and astrocytomas (A). The one-variable group WHO summarizing the tumor classification is projected as an illustrative group (Figure 3B). Since its coordinate on PC1 is rather high, the structure induced by this group is linked to PC1: the types of tumors are well separated along this dimension. Its coordinate on PC2 is also relatively important, showing that the types of tumors are also separated on PC2. Following the examination of these graphical outputs, PC1 is linked to glioblastoma characteristics and PC2 corresponds to oligodendroglioma characteristics as it stresses the differences between these tumors and the other gliomas.


Simultaneous analysis of distinct Omics data sets with integration of biological knowledge: Multiple Factor Analysis approach.

de Tayrac M, Lê S, Aubry M, Mosser J, Husson F - BMC Genomics (2009)

Multi-way glioma data set: Characteristics of glioblastoma are linked to CGH and expression data whereas characteristics of oligodendrogliomas are mostly related to CGH data. The partial representation of the mean individuals (CGH and eX) for each WHO tumor type (A) and the group representation (B) are displayed. (A) The balanced representation of each category is located in the exact barycenter of the points summarizing partial points of view (CGH; linked by plain line and eX; linked by dot line). The projection of the partial representations for each category (CGH and eX for oligodendrogliomas, astrocytomas, oligoastrocytomas and glioblastomas) onto PC1 are very close; the partition of the tumors into WHO classification is thus shared by the genome and the transcriptome. On PC2, all the mean individuals from the partial expression representation (eX) are located around the origin. It is not the case for the genomic one (CGH). PC2 is therefore specific to the genomic point of view and is not shared by the expressional one. This is confirmed by analyzing the group representation (B): projection of the CGH and eX groups are closed along PC1 but only the one of CGH have a value close to 1 on PC2.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2636827&req=5

Figure 3: Multi-way glioma data set: Characteristics of glioblastoma are linked to CGH and expression data whereas characteristics of oligodendrogliomas are mostly related to CGH data. The partial representation of the mean individuals (CGH and eX) for each WHO tumor type (A) and the group representation (B) are displayed. (A) The balanced representation of each category is located in the exact barycenter of the points summarizing partial points of view (CGH; linked by plain line and eX; linked by dot line). The projection of the partial representations for each category (CGH and eX for oligodendrogliomas, astrocytomas, oligoastrocytomas and glioblastomas) onto PC1 are very close; the partition of the tumors into WHO classification is thus shared by the genome and the transcriptome. On PC2, all the mean individuals from the partial expression representation (eX) are located around the origin. It is not the case for the genomic one (CGH). PC2 is therefore specific to the genomic point of view and is not shared by the expressional one. This is confirmed by analyzing the group representation (B): projection of the CGH and eX groups are closed along PC1 but only the one of CGH have a value close to 1 on PC2.
Mentions: MFA is applied to the paired CGH array and microarray glioma data of Bredel et al. [19,20]. The resulting sample plots (33.7% of the total variability) are presented Figure 2 and Figure 3. The mean representation of the samples according to both CGH and gene expression data sets is presented Figure 2. Mean samples are represented by points colored following WHO classification of the tumors. Figure 3A shows the partial representation associated to each type of tumors (WHO classification: O, oligodendrogliomas; A, astrocytomas; OA, mixed oligo-astrocytomas and GBM, glioblastomas). This representation is obtained from the consensus between the CGH and expression (eX) points of view (i.e. genome and transcriptome variations). Each type of tumors is represented by three points: the consensus between the two points of view and a point for each point of view. Both scatter plots show a well-defined partition of the samples into WHO classification. This is particularly true along PC1 that underlines a partition of the samples into glioblastomas (GBM) and lower grade gliomas (O, A, OA). Partial representation (Figure 3A) and groups representation (Figure 3B) show that this partition exists (i) on PC1 at the genome and at the transcriptome levels and (ii) only at the genome level on PC2. Indeed, the projections on PC1 of the partial points for each category of tumors (CGH and eX for O, A, OA and GBM) are each time very close, meaning that CGH and eX define similar structures upon tumors on PC1. In a same manner, the projections of groups CGH and eX on PC1 have coordinates close to 1. On PC2, all the mean individuals from the partial expression representation (eX) are located around the origin, which is not the case for the genomic one (CGH); meaning that PC2 is specific to the genomic point of view. In the same manner, only projection of group CGH on PC2 has coordinates close to 1 (Figure 3B). Regarding CGH data, PC2 provides a partition of the histological subtypes and particularly stresses differences between oligodendrogliomas (O) and astrocytomas (A). The one-variable group WHO summarizing the tumor classification is projected as an illustrative group (Figure 3B). Since its coordinate on PC1 is rather high, the structure induced by this group is linked to PC1: the types of tumors are well separated along this dimension. Its coordinate on PC2 is also relatively important, showing that the types of tumors are also separated on PC2. Following the examination of these graphical outputs, PC1 is linked to glioblastoma characteristics and PC2 corresponds to oligodendroglioma characteristics as it stresses the differences between these tumors and the other gliomas.

Bottom Line: Genomic analysis will greatly benefit from considering in a global way various sources of molecular data with the related biological knowledge.When applied to genomic and transcriptomic data and associated Gene Ontology annotations, our method prioritize the biological processes linked to the experimental settings.Furthermore, it reduces the time and effort to analyze large amounts of 'Omics' data.

View Article: PubMed Central - HTML - PubMed

Affiliation: CNRS UMR 6061, Université de Rennes 1, IFR 140, Faculté de Médecine, CS 34317, 35043 Rennes, France. marie.de-tayrac@univ-rennes1.fr

ABSTRACT

Background: Genomic analysis will greatly benefit from considering in a global way various sources of molecular data with the related biological knowledge. It is thus of great importance to provide useful integrative approaches dedicated to ease the interpretation of microarray data.

Results: Here, we introduce a data-mining approach, Multiple Factor Analysis (MFA), to combine multiple data sets and to add formalized knowledge. MFA is used to jointly analyse the structure emerging from genomic and transcriptomic data sets. The common structures are underlined and graphical outputs are provided such that biological meaning becomes easily retrievable. Gene Ontology terms are used to build gene modules that are superimposed on the experimentally interpreted plots. Functional interpretations are then supported by a step-by-step sequence of graphical representations.

Conclusion: When applied to genomic and transcriptomic data and associated Gene Ontology annotations, our method prioritize the biological processes linked to the experimental settings. Furthermore, it reduces the time and effort to analyze large amounts of 'Omics' data.

Show MeSH
Related in: MedlinePlus