Limits...
Knowledge-guided gene ranking by coordinative component analysis.

Wang C, Xuan J, Li H, Wang Y, Zhan M, Hoffman EP, Clarke R - BMC Bioinformatics (2010)

Bottom Line: Thus, it is important to prioritize those genes that are strongly associated with the functionality of a network.COCA was initially tested on simulation data and then on published gene expression microarray data to demonstrate its improved performance as compared to traditional statistical methods.The experimental results show that such a knowledge-guided strategy can provide context-specific gene ranking with an improved performance in pathway member identification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, USA.

ABSTRACT

Background: In cancer, gene networks and pathways often exhibit dynamic behavior, particularly during the process of carcinogenesis. Thus, it is important to prioritize those genes that are strongly associated with the functionality of a network. Traditional statistical methods are often inept to identify biologically relevant member genes, motivating researchers to incorporate biological knowledge into gene ranking methods. However, current integration strategies are often heuristic and fail to incorporate fully the true interplay between biological knowledge and gene expression data.

Results: To improve knowledge-guided gene ranking, we propose a novel method called coordinative component analysis (COCA) in this paper. COCA explicitly captures those genes within a specific biological context that are likely to be expressed in a coordinative manner. Formulated as an optimization problem to maximize the coordinative effort, COCA is designed to first extract the coordinative components based on a partial guidance from knowledge genes and then rank the genes according to their participation strengths. An embedded bootstrapping procedure is implemented to improve statistical robustness of the solutions. COCA was initially tested on simulation data and then on published gene expression microarray data to demonstrate its improved performance as compared to traditional statistical methods. Finally, the COCA approach has been applied to stem cell data to identify biologically relevant genes in signaling pathways. As a result, the COCA approach uncovers novel pathway members that may shed light into the pathway deregulation in cancers.

Conclusion: We have developed a new integrative strategy to combine biological knowledge and microarray data for gene ranking. The method utilizes knowledge genes for a guidance to first extract coordinative components, and then rank the genes according to their contribution related to a network or pathway. The experimental results show that such a knowledge-guided strategy can provide context-specific gene ranking with an improved performance in pathway member identification.

Show MeSH

Related in: MedlinePlus

The identified Notch pathway including several growth factors, transcription factors and oncogenes. Some of the members (e.g., NOTCH3, JAG1, JAG2 and SOX2) are known to be associated with the Notch pathway while several novel members are revealed by the COCA approach, e.g., transcription factors: TCF4, TBP and PITX2; oncogenes: MYCN, FGFR1 and CCND1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2865494&req=5

Figure 5: The identified Notch pathway including several growth factors, transcription factors and oncogenes. Some of the members (e.g., NOTCH3, JAG1, JAG2 and SOX2) are known to be associated with the Notch pathway while several novel members are revealed by the COCA approach, e.g., transcription factors: TCF4, TBP and PITX2; oncogenes: MYCN, FGFR1 and CCND1.

Mentions: For each pathway analysis, we generated a gene list of top 500 probe sets ranked by COCA, and conducted pathway and functional enrichment analysis using DAVID [34]http://david.abcc.ncifcrf.gov/. The results of GO enrichment analysis are listed in Table 2 for the Notch pathway; the results of enrichment analysis of other pathways (i.e., JAK/STAT, TGFβ and WNT pathways) and the detailed gene lists can be found in the Supplemental Tables S1 [Additional file 1], S3 - S6 [Additional files 2, 3, 4 and 5]. Taking the results of Notch pathway as an example, we can see from Figure 4 that COCA effectively boosts the ranking of pathway-relating gene set, as compared to conventional approaches like VR and the EDGE [6]. Once the coordinative direction is estimated, we can discover weakly expressed but related genes. While it is well known that many downstream genes have large variation, COCA can boost the ranking of genes with smaller variation but larger participation value. From pathway enrichment analysis, we can see that VR mainly prioritizes ribosome, cell adhesion and metabolic pathways (Table S7), which are more likely the downstream of stem cell development. The EDGE-based ranking prioritizes the pathways related to cell communication, focal adhesion and ECM-receptor interaction (Table S8). On the other hand, COCA-based ranking prioritizes many upstream pathways (Table 2), especially several signaling pathways that might be the cause of those downstream pathways identified by VR. The gene list obtained from Notch pathway-guided COCA includes a notch receptor (NOTCH3) and three ligands (DSL1, JAG1 and JAG2) that can potentially bind to the notch receptor (Figure 5); the list also includes APH-1, a gene encoding a multipass membrane protein, which is required for notch pathway signaling; besides, the list includes many transcription factors as the Notch target genes, revealing a signaling cascade to modulate cell fate by further regulating downstream gene expression. For example, SOX2 in the list is a transcription factor closely related to notch pathway in the development of inner ear [35] and neocortex [36]. While functional enrichment analysis gives us a global picture of that top COCA-ranked genes tend to have better function over-representation than those ranked by VR or EDGE, we also performed Gene Set Enrichment Analysis (GSEA) [37] on the ranked gene lists to further examine whether the ranking can promote the knowledge gene set significantly. In this study we used a web tool, GeneTrail [38], for the GSEA analysis, where false discovery rate (FDR) was used to correct for multiple hypothesis testing (the FDR threshold was set as 10%). We also set the minimum gene number as 10 in order to avoid finding too small sized gene sets. We can see from the results (Table 3 and Table S2(a)-(c)) that COCA ranking tends to boost signaling pathways to be ranked relatively high, while variance-based ranking (VR) mainly boosts ribosome, metabolic pathway and other downstream biological processes (Table 4). None of the signaling pathways from the COCA approach is shown in the GSEA results from the VR approach. We also noticed that the JAK-STAT pathway (GSEA FDR = 0.077) was ranked relatively lower than all the other pathways (GSEA FDR = 0.013, 9.71E-05, 0.042 for Notch, TGF-beta and WNT, respectively). To understand this, we looked further into the GSEA results from the VR approach, and found that JAK-STAT member genes were significantly enriched at the bottom of the VR ranking list (FDR = 0.0279572), suggesting that most of JAK-STAT member genes have lower expression change (thus, relatively weak signal). That could explain, or at least in part, why JAK-STAT pathway was ranked lower than the other pathways (i.e., Notch, TGF-beta and WNT pathways).


Knowledge-guided gene ranking by coordinative component analysis.

Wang C, Xuan J, Li H, Wang Y, Zhan M, Hoffman EP, Clarke R - BMC Bioinformatics (2010)

The identified Notch pathway including several growth factors, transcription factors and oncogenes. Some of the members (e.g., NOTCH3, JAG1, JAG2 and SOX2) are known to be associated with the Notch pathway while several novel members are revealed by the COCA approach, e.g., transcription factors: TCF4, TBP and PITX2; oncogenes: MYCN, FGFR1 and CCND1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2865494&req=5

Figure 5: The identified Notch pathway including several growth factors, transcription factors and oncogenes. Some of the members (e.g., NOTCH3, JAG1, JAG2 and SOX2) are known to be associated with the Notch pathway while several novel members are revealed by the COCA approach, e.g., transcription factors: TCF4, TBP and PITX2; oncogenes: MYCN, FGFR1 and CCND1.
Mentions: For each pathway analysis, we generated a gene list of top 500 probe sets ranked by COCA, and conducted pathway and functional enrichment analysis using DAVID [34]http://david.abcc.ncifcrf.gov/. The results of GO enrichment analysis are listed in Table 2 for the Notch pathway; the results of enrichment analysis of other pathways (i.e., JAK/STAT, TGFβ and WNT pathways) and the detailed gene lists can be found in the Supplemental Tables S1 [Additional file 1], S3 - S6 [Additional files 2, 3, 4 and 5]. Taking the results of Notch pathway as an example, we can see from Figure 4 that COCA effectively boosts the ranking of pathway-relating gene set, as compared to conventional approaches like VR and the EDGE [6]. Once the coordinative direction is estimated, we can discover weakly expressed but related genes. While it is well known that many downstream genes have large variation, COCA can boost the ranking of genes with smaller variation but larger participation value. From pathway enrichment analysis, we can see that VR mainly prioritizes ribosome, cell adhesion and metabolic pathways (Table S7), which are more likely the downstream of stem cell development. The EDGE-based ranking prioritizes the pathways related to cell communication, focal adhesion and ECM-receptor interaction (Table S8). On the other hand, COCA-based ranking prioritizes many upstream pathways (Table 2), especially several signaling pathways that might be the cause of those downstream pathways identified by VR. The gene list obtained from Notch pathway-guided COCA includes a notch receptor (NOTCH3) and three ligands (DSL1, JAG1 and JAG2) that can potentially bind to the notch receptor (Figure 5); the list also includes APH-1, a gene encoding a multipass membrane protein, which is required for notch pathway signaling; besides, the list includes many transcription factors as the Notch target genes, revealing a signaling cascade to modulate cell fate by further regulating downstream gene expression. For example, SOX2 in the list is a transcription factor closely related to notch pathway in the development of inner ear [35] and neocortex [36]. While functional enrichment analysis gives us a global picture of that top COCA-ranked genes tend to have better function over-representation than those ranked by VR or EDGE, we also performed Gene Set Enrichment Analysis (GSEA) [37] on the ranked gene lists to further examine whether the ranking can promote the knowledge gene set significantly. In this study we used a web tool, GeneTrail [38], for the GSEA analysis, where false discovery rate (FDR) was used to correct for multiple hypothesis testing (the FDR threshold was set as 10%). We also set the minimum gene number as 10 in order to avoid finding too small sized gene sets. We can see from the results (Table 3 and Table S2(a)-(c)) that COCA ranking tends to boost signaling pathways to be ranked relatively high, while variance-based ranking (VR) mainly boosts ribosome, metabolic pathway and other downstream biological processes (Table 4). None of the signaling pathways from the COCA approach is shown in the GSEA results from the VR approach. We also noticed that the JAK-STAT pathway (GSEA FDR = 0.077) was ranked relatively lower than all the other pathways (GSEA FDR = 0.013, 9.71E-05, 0.042 for Notch, TGF-beta and WNT, respectively). To understand this, we looked further into the GSEA results from the VR approach, and found that JAK-STAT member genes were significantly enriched at the bottom of the VR ranking list (FDR = 0.0279572), suggesting that most of JAK-STAT member genes have lower expression change (thus, relatively weak signal). That could explain, or at least in part, why JAK-STAT pathway was ranked lower than the other pathways (i.e., Notch, TGF-beta and WNT pathways).

Bottom Line: Thus, it is important to prioritize those genes that are strongly associated with the functionality of a network.COCA was initially tested on simulation data and then on published gene expression microarray data to demonstrate its improved performance as compared to traditional statistical methods.The experimental results show that such a knowledge-guided strategy can provide context-specific gene ranking with an improved performance in pathway member identification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, USA.

ABSTRACT

Background: In cancer, gene networks and pathways often exhibit dynamic behavior, particularly during the process of carcinogenesis. Thus, it is important to prioritize those genes that are strongly associated with the functionality of a network. Traditional statistical methods are often inept to identify biologically relevant member genes, motivating researchers to incorporate biological knowledge into gene ranking methods. However, current integration strategies are often heuristic and fail to incorporate fully the true interplay between biological knowledge and gene expression data.

Results: To improve knowledge-guided gene ranking, we propose a novel method called coordinative component analysis (COCA) in this paper. COCA explicitly captures those genes within a specific biological context that are likely to be expressed in a coordinative manner. Formulated as an optimization problem to maximize the coordinative effort, COCA is designed to first extract the coordinative components based on a partial guidance from knowledge genes and then rank the genes according to their participation strengths. An embedded bootstrapping procedure is implemented to improve statistical robustness of the solutions. COCA was initially tested on simulation data and then on published gene expression microarray data to demonstrate its improved performance as compared to traditional statistical methods. Finally, the COCA approach has been applied to stem cell data to identify biologically relevant genes in signaling pathways. As a result, the COCA approach uncovers novel pathway members that may shed light into the pathway deregulation in cancers.

Conclusion: We have developed a new integrative strategy to combine biological knowledge and microarray data for gene ranking. The method utilizes knowledge genes for a guidance to first extract coordinative components, and then rank the genes according to their contribution related to a network or pathway. The experimental results show that such a knowledge-guided strategy can provide context-specific gene ranking with an improved performance in pathway member identification.

Show MeSH
Related in: MedlinePlus