Limits...
Simultaneous analysis of distinct Omics data sets with integration of biological knowledge: Multiple Factor Analysis approach.

de Tayrac M, Lê S, Aubry M, Mosser J, Husson F - BMC Genomics (2009)

Bottom Line: Genomic analysis will greatly benefit from considering in a global way various sources of molecular data with the related biological knowledge.When applied to genomic and transcriptomic data and associated Gene Ontology annotations, our method prioritize the biological processes linked to the experimental settings.Furthermore, it reduces the time and effort to analyze large amounts of 'Omics' data.

View Article: PubMed Central - HTML - PubMed

Affiliation: CNRS UMR 6061, Université de Rennes 1, IFR 140, Faculté de Médecine, CS 34317, 35043 Rennes, France. marie.de-tayrac@univ-rennes1.fr

ABSTRACT

Background: Genomic analysis will greatly benefit from considering in a global way various sources of molecular data with the related biological knowledge. It is thus of great importance to provide useful integrative approaches dedicated to ease the interpretation of microarray data.

Results: Here, we introduce a data-mining approach, Multiple Factor Analysis (MFA), to combine multiple data sets and to add formalized knowledge. MFA is used to jointly analyse the structure emerging from genomic and transcriptomic data sets. The common structures are underlined and graphical outputs are provided such that biological meaning becomes easily retrievable. Gene Ontology terms are used to build gene modules that are superimposed on the experimentally interpreted plots. Functional interpretations are then supported by a step-by-step sequence of graphical representations.

Conclusion: When applied to genomic and transcriptomic data and associated Gene Ontology annotations, our method prioritize the biological processes linked to the experimental settings. Furthermore, it reduces the time and effort to analyze large amounts of 'Omics' data.

Show MeSH

Related in: MedlinePlus

Multi-way glioma data set: MFA consensus between CGH and expression highlights a partition of gliomas into WHO classification. Individuals (tumors) are presented as points on the scatter plot created with the first two main dimensions of MFA. Each individual is colored following the glioma subtype (WHO classification); mean individual are also displayed. Projection of the tumors onto PC1 underlines a partition into glioblastomas (GBM) and lower grade gliomas (oligodendrogliomas, astrocytomas, oligoastrocytomas). PC2 mainly stresses differences between astrocytomas (A) and oligodendrogliomas (O). As PC1 and PC2 represents the first two main factors of MFA they could be interpretated: PC1 summarizes characteristics of glioblastoma i.e. transcriptional differences existing between glioblastomas and lower grade gliomas; PC2 summarizes characteristics of oligodendrogliomas as it stresses the differences between glial tumors coming from astrocytic cells from those arising from oligodendroglial ones.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2636827&req=5

Figure 2: Multi-way glioma data set: MFA consensus between CGH and expression highlights a partition of gliomas into WHO classification. Individuals (tumors) are presented as points on the scatter plot created with the first two main dimensions of MFA. Each individual is colored following the glioma subtype (WHO classification); mean individual are also displayed. Projection of the tumors onto PC1 underlines a partition into glioblastomas (GBM) and lower grade gliomas (oligodendrogliomas, astrocytomas, oligoastrocytomas). PC2 mainly stresses differences between astrocytomas (A) and oligodendrogliomas (O). As PC1 and PC2 represents the first two main factors of MFA they could be interpretated: PC1 summarizes characteristics of glioblastoma i.e. transcriptional differences existing between glioblastomas and lower grade gliomas; PC2 summarizes characteristics of oligodendrogliomas as it stresses the differences between glial tumors coming from astrocytic cells from those arising from oligodendroglial ones.

Mentions: MFA is applied to the paired CGH array and microarray glioma data of Bredel et al. [19,20]. The resulting sample plots (33.7% of the total variability) are presented Figure 2 and Figure 3. The mean representation of the samples according to both CGH and gene expression data sets is presented Figure 2. Mean samples are represented by points colored following WHO classification of the tumors. Figure 3A shows the partial representation associated to each type of tumors (WHO classification: O, oligodendrogliomas; A, astrocytomas; OA, mixed oligo-astrocytomas and GBM, glioblastomas). This representation is obtained from the consensus between the CGH and expression (eX) points of view (i.e. genome and transcriptome variations). Each type of tumors is represented by three points: the consensus between the two points of view and a point for each point of view. Both scatter plots show a well-defined partition of the samples into WHO classification. This is particularly true along PC1 that underlines a partition of the samples into glioblastomas (GBM) and lower grade gliomas (O, A, OA). Partial representation (Figure 3A) and groups representation (Figure 3B) show that this partition exists (i) on PC1 at the genome and at the transcriptome levels and (ii) only at the genome level on PC2. Indeed, the projections on PC1 of the partial points for each category of tumors (CGH and eX for O, A, OA and GBM) are each time very close, meaning that CGH and eX define similar structures upon tumors on PC1. In a same manner, the projections of groups CGH and eX on PC1 have coordinates close to 1. On PC2, all the mean individuals from the partial expression representation (eX) are located around the origin, which is not the case for the genomic one (CGH); meaning that PC2 is specific to the genomic point of view. In the same manner, only projection of group CGH on PC2 has coordinates close to 1 (Figure 3B). Regarding CGH data, PC2 provides a partition of the histological subtypes and particularly stresses differences between oligodendrogliomas (O) and astrocytomas (A). The one-variable group WHO summarizing the tumor classification is projected as an illustrative group (Figure 3B). Since its coordinate on PC1 is rather high, the structure induced by this group is linked to PC1: the types of tumors are well separated along this dimension. Its coordinate on PC2 is also relatively important, showing that the types of tumors are also separated on PC2. Following the examination of these graphical outputs, PC1 is linked to glioblastoma characteristics and PC2 corresponds to oligodendroglioma characteristics as it stresses the differences between these tumors and the other gliomas.


Simultaneous analysis of distinct Omics data sets with integration of biological knowledge: Multiple Factor Analysis approach.

de Tayrac M, Lê S, Aubry M, Mosser J, Husson F - BMC Genomics (2009)

Multi-way glioma data set: MFA consensus between CGH and expression highlights a partition of gliomas into WHO classification. Individuals (tumors) are presented as points on the scatter plot created with the first two main dimensions of MFA. Each individual is colored following the glioma subtype (WHO classification); mean individual are also displayed. Projection of the tumors onto PC1 underlines a partition into glioblastomas (GBM) and lower grade gliomas (oligodendrogliomas, astrocytomas, oligoastrocytomas). PC2 mainly stresses differences between astrocytomas (A) and oligodendrogliomas (O). As PC1 and PC2 represents the first two main factors of MFA they could be interpretated: PC1 summarizes characteristics of glioblastoma i.e. transcriptional differences existing between glioblastomas and lower grade gliomas; PC2 summarizes characteristics of oligodendrogliomas as it stresses the differences between glial tumors coming from astrocytic cells from those arising from oligodendroglial ones.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2636827&req=5

Figure 2: Multi-way glioma data set: MFA consensus between CGH and expression highlights a partition of gliomas into WHO classification. Individuals (tumors) are presented as points on the scatter plot created with the first two main dimensions of MFA. Each individual is colored following the glioma subtype (WHO classification); mean individual are also displayed. Projection of the tumors onto PC1 underlines a partition into glioblastomas (GBM) and lower grade gliomas (oligodendrogliomas, astrocytomas, oligoastrocytomas). PC2 mainly stresses differences between astrocytomas (A) and oligodendrogliomas (O). As PC1 and PC2 represents the first two main factors of MFA they could be interpretated: PC1 summarizes characteristics of glioblastoma i.e. transcriptional differences existing between glioblastomas and lower grade gliomas; PC2 summarizes characteristics of oligodendrogliomas as it stresses the differences between glial tumors coming from astrocytic cells from those arising from oligodendroglial ones.
Mentions: MFA is applied to the paired CGH array and microarray glioma data of Bredel et al. [19,20]. The resulting sample plots (33.7% of the total variability) are presented Figure 2 and Figure 3. The mean representation of the samples according to both CGH and gene expression data sets is presented Figure 2. Mean samples are represented by points colored following WHO classification of the tumors. Figure 3A shows the partial representation associated to each type of tumors (WHO classification: O, oligodendrogliomas; A, astrocytomas; OA, mixed oligo-astrocytomas and GBM, glioblastomas). This representation is obtained from the consensus between the CGH and expression (eX) points of view (i.e. genome and transcriptome variations). Each type of tumors is represented by three points: the consensus between the two points of view and a point for each point of view. Both scatter plots show a well-defined partition of the samples into WHO classification. This is particularly true along PC1 that underlines a partition of the samples into glioblastomas (GBM) and lower grade gliomas (O, A, OA). Partial representation (Figure 3A) and groups representation (Figure 3B) show that this partition exists (i) on PC1 at the genome and at the transcriptome levels and (ii) only at the genome level on PC2. Indeed, the projections on PC1 of the partial points for each category of tumors (CGH and eX for O, A, OA and GBM) are each time very close, meaning that CGH and eX define similar structures upon tumors on PC1. In a same manner, the projections of groups CGH and eX on PC1 have coordinates close to 1. On PC2, all the mean individuals from the partial expression representation (eX) are located around the origin, which is not the case for the genomic one (CGH); meaning that PC2 is specific to the genomic point of view. In the same manner, only projection of group CGH on PC2 has coordinates close to 1 (Figure 3B). Regarding CGH data, PC2 provides a partition of the histological subtypes and particularly stresses differences between oligodendrogliomas (O) and astrocytomas (A). The one-variable group WHO summarizing the tumor classification is projected as an illustrative group (Figure 3B). Since its coordinate on PC1 is rather high, the structure induced by this group is linked to PC1: the types of tumors are well separated along this dimension. Its coordinate on PC2 is also relatively important, showing that the types of tumors are also separated on PC2. Following the examination of these graphical outputs, PC1 is linked to glioblastoma characteristics and PC2 corresponds to oligodendroglioma characteristics as it stresses the differences between these tumors and the other gliomas.

Bottom Line: Genomic analysis will greatly benefit from considering in a global way various sources of molecular data with the related biological knowledge.When applied to genomic and transcriptomic data and associated Gene Ontology annotations, our method prioritize the biological processes linked to the experimental settings.Furthermore, it reduces the time and effort to analyze large amounts of 'Omics' data.

View Article: PubMed Central - HTML - PubMed

Affiliation: CNRS UMR 6061, Université de Rennes 1, IFR 140, Faculté de Médecine, CS 34317, 35043 Rennes, France. marie.de-tayrac@univ-rennes1.fr

ABSTRACT

Background: Genomic analysis will greatly benefit from considering in a global way various sources of molecular data with the related biological knowledge. It is thus of great importance to provide useful integrative approaches dedicated to ease the interpretation of microarray data.

Results: Here, we introduce a data-mining approach, Multiple Factor Analysis (MFA), to combine multiple data sets and to add formalized knowledge. MFA is used to jointly analyse the structure emerging from genomic and transcriptomic data sets. The common structures are underlined and graphical outputs are provided such that biological meaning becomes easily retrievable. Gene Ontology terms are used to build gene modules that are superimposed on the experimentally interpreted plots. Functional interpretations are then supported by a step-by-step sequence of graphical representations.

Conclusion: When applied to genomic and transcriptomic data and associated Gene Ontology annotations, our method prioritize the biological processes linked to the experimental settings. Furthermore, it reduces the time and effort to analyze large amounts of 'Omics' data.

Show MeSH
Related in: MedlinePlus