Limits...
Statistical Test of Expression Pattern (STEPath): a new strategy to integrate gene expression data with genomic information in individual and meta-analysis studies.

Martini P, Risso D, Sales G, Romualdi C, Lanfranchi G, Cagnin S - BMC Bioinformatics (2011)

Bottom Line: One complication with meta-analysis is batch effects, which occur because molecular measurements are affected by laboratory conditions, reagent lots and personnel differences.Major problems occur when batch effects are correlated with an outcome of interest and lead to incorrect conclusions.The usage of the STEPath-computed gene set scores overcomes batch effects in the meta-analysis approaches allowing the direct comparison of different pathologies and different studies on a gene set activation level.

View Article: PubMed Central - HTML - PubMed

Affiliation: CRIBI Biotechnology Centre, Department of Biology, University of Padova, via U, Bassi 58/B, 35121 Padova, Italy.

ABSTRACT

Background: In the last decades, microarray technology has spread, leading to a dramatic increase of publicly available datasets. The first statistical tools developed were focused on the identification of significant differentially expressed genes. Later, researchers moved toward the systematic integration of gene expression profiles with additional biological information, such as chromosomal location, ontological annotations or sequence features. The analysis of gene expression linked to physical location of genes on chromosomes allows the identification of transcriptionally imbalanced regions, while, Gene Set Analysis focuses on the detection of coordinated changes in transcriptional levels among sets of biologically related genes. In this field, meta-analysis offers the possibility to compare different studies, addressing the same biological question to fully exploit public gene expression datasets.

Results: We describe STEPath, a method that starts from gene expression profiles and integrates the analysis of imbalanced region as an a priori step before performing gene set analysis. The application of STEPath in individual studies produced gene set scores weighted by chromosomal activation. As a final step, we propose a way to compare these scores across different studies (meta-analysis) on related biological issues. One complication with meta-analysis is batch effects, which occur because molecular measurements are affected by laboratory conditions, reagent lots and personnel differences. Major problems occur when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. We evaluated the power of combining chromosome mapping and gene set enrichment analysis, performing the analysis on a dataset of leukaemia (example of individual study) and on a dataset of skeletal muscle diseases (meta-analysis approach). In leukaemia, we identified the Hox gene set, a gene set closely related to the pathology that other algorithms of gene set analysis do not identify, while the meta-analysis approach on muscular disease discriminates between related pathologies and correlates similar ones from different studies.

Conclusions: STEPath is a new method that integrates gene expression profiles, genomic co-expressed regions and the information about the biological function of genes. The usage of the STEPath-computed gene set scores overcomes batch effects in the meta-analysis approaches allowing the direct comparison of different pathologies and different studies on a gene set activation level.

Show MeSH

Related in: MedlinePlus

Enlargement of chromosomal regions related to leukaemia phenotype. Details on imbalanced regions calculated by STEPath chromosome mapping. Blue line represents chromosome profile; red and light green bars represent gene statistic values (d-score). A. Enlargement of the region of chromosome 11 containing the MLL gene (gene highlighted by the circle). B. Enlargement of the region between 20 and 32 Mbp of chromosome 7. This region corresponds to the localization of the HOX gene cluster (cluster highlighted by the circle). C. Enlargement of the region between 51 and 75 Mbp of chromosome 2 corresponding to the MEIS1 windows (gene highlighted by the circle). D. Enlargement of the region of chromosome 15 containing the NG2 gene (gene highlighted by the circle).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3094239&req=5

Figure 1: Enlargement of chromosomal regions related to leukaemia phenotype. Details on imbalanced regions calculated by STEPath chromosome mapping. Blue line represents chromosome profile; red and light green bars represent gene statistic values (d-score). A. Enlargement of the region of chromosome 11 containing the MLL gene (gene highlighted by the circle). B. Enlargement of the region between 20 and 32 Mbp of chromosome 7. This region corresponds to the localization of the HOX gene cluster (cluster highlighted by the circle). C. Enlargement of the region between 51 and 75 Mbp of chromosome 2 corresponding to the MEIS1 windows (gene highlighted by the circle). D. Enlargement of the region of chromosome 15 containing the NG2 gene (gene highlighted by the circle).

Mentions: Using our implementation, we were able to identify a spectrum of possible imbalanced regions across all chromosomes (see Additional file 1; Figure S2). We identified the down-regulation of the region that contains the MLL gene (Figure 1A; Additional file 2; Table S1). MLL is characterized by a chromosome rearrangement, disrupting its correct localization and transcriptional regulation [46].


Statistical Test of Expression Pattern (STEPath): a new strategy to integrate gene expression data with genomic information in individual and meta-analysis studies.

Martini P, Risso D, Sales G, Romualdi C, Lanfranchi G, Cagnin S - BMC Bioinformatics (2011)

Enlargement of chromosomal regions related to leukaemia phenotype. Details on imbalanced regions calculated by STEPath chromosome mapping. Blue line represents chromosome profile; red and light green bars represent gene statistic values (d-score). A. Enlargement of the region of chromosome 11 containing the MLL gene (gene highlighted by the circle). B. Enlargement of the region between 20 and 32 Mbp of chromosome 7. This region corresponds to the localization of the HOX gene cluster (cluster highlighted by the circle). C. Enlargement of the region between 51 and 75 Mbp of chromosome 2 corresponding to the MEIS1 windows (gene highlighted by the circle). D. Enlargement of the region of chromosome 15 containing the NG2 gene (gene highlighted by the circle).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3094239&req=5

Figure 1: Enlargement of chromosomal regions related to leukaemia phenotype. Details on imbalanced regions calculated by STEPath chromosome mapping. Blue line represents chromosome profile; red and light green bars represent gene statistic values (d-score). A. Enlargement of the region of chromosome 11 containing the MLL gene (gene highlighted by the circle). B. Enlargement of the region between 20 and 32 Mbp of chromosome 7. This region corresponds to the localization of the HOX gene cluster (cluster highlighted by the circle). C. Enlargement of the region between 51 and 75 Mbp of chromosome 2 corresponding to the MEIS1 windows (gene highlighted by the circle). D. Enlargement of the region of chromosome 15 containing the NG2 gene (gene highlighted by the circle).
Mentions: Using our implementation, we were able to identify a spectrum of possible imbalanced regions across all chromosomes (see Additional file 1; Figure S2). We identified the down-regulation of the region that contains the MLL gene (Figure 1A; Additional file 2; Table S1). MLL is characterized by a chromosome rearrangement, disrupting its correct localization and transcriptional regulation [46].

Bottom Line: One complication with meta-analysis is batch effects, which occur because molecular measurements are affected by laboratory conditions, reagent lots and personnel differences.Major problems occur when batch effects are correlated with an outcome of interest and lead to incorrect conclusions.The usage of the STEPath-computed gene set scores overcomes batch effects in the meta-analysis approaches allowing the direct comparison of different pathologies and different studies on a gene set activation level.

View Article: PubMed Central - HTML - PubMed

Affiliation: CRIBI Biotechnology Centre, Department of Biology, University of Padova, via U, Bassi 58/B, 35121 Padova, Italy.

ABSTRACT

Background: In the last decades, microarray technology has spread, leading to a dramatic increase of publicly available datasets. The first statistical tools developed were focused on the identification of significant differentially expressed genes. Later, researchers moved toward the systematic integration of gene expression profiles with additional biological information, such as chromosomal location, ontological annotations or sequence features. The analysis of gene expression linked to physical location of genes on chromosomes allows the identification of transcriptionally imbalanced regions, while, Gene Set Analysis focuses on the detection of coordinated changes in transcriptional levels among sets of biologically related genes. In this field, meta-analysis offers the possibility to compare different studies, addressing the same biological question to fully exploit public gene expression datasets.

Results: We describe STEPath, a method that starts from gene expression profiles and integrates the analysis of imbalanced region as an a priori step before performing gene set analysis. The application of STEPath in individual studies produced gene set scores weighted by chromosomal activation. As a final step, we propose a way to compare these scores across different studies (meta-analysis) on related biological issues. One complication with meta-analysis is batch effects, which occur because molecular measurements are affected by laboratory conditions, reagent lots and personnel differences. Major problems occur when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. We evaluated the power of combining chromosome mapping and gene set enrichment analysis, performing the analysis on a dataset of leukaemia (example of individual study) and on a dataset of skeletal muscle diseases (meta-analysis approach). In leukaemia, we identified the Hox gene set, a gene set closely related to the pathology that other algorithms of gene set analysis do not identify, while the meta-analysis approach on muscular disease discriminates between related pathologies and correlates similar ones from different studies.

Conclusions: STEPath is a new method that integrates gene expression profiles, genomic co-expressed regions and the information about the biological function of genes. The usage of the STEPath-computed gene set scores overcomes batch effects in the meta-analysis approaches allowing the direct comparison of different pathologies and different studies on a gene set activation level.

Show MeSH
Related in: MedlinePlus