Limits...
Knowledge-guided gene ranking by coordinative component analysis.

Wang C, Xuan J, Li H, Wang Y, Zhan M, Hoffman EP, Clarke R - BMC Bioinformatics (2010)

Bottom Line: Thus, it is important to prioritize those genes that are strongly associated with the functionality of a network.COCA was initially tested on simulation data and then on published gene expression microarray data to demonstrate its improved performance as compared to traditional statistical methods.The experimental results show that such a knowledge-guided strategy can provide context-specific gene ranking with an improved performance in pathway member identification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, USA.

ABSTRACT

Background: In cancer, gene networks and pathways often exhibit dynamic behavior, particularly during the process of carcinogenesis. Thus, it is important to prioritize those genes that are strongly associated with the functionality of a network. Traditional statistical methods are often inept to identify biologically relevant member genes, motivating researchers to incorporate biological knowledge into gene ranking methods. However, current integration strategies are often heuristic and fail to incorporate fully the true interplay between biological knowledge and gene expression data.

Results: To improve knowledge-guided gene ranking, we propose a novel method called coordinative component analysis (COCA) in this paper. COCA explicitly captures those genes within a specific biological context that are likely to be expressed in a coordinative manner. Formulated as an optimization problem to maximize the coordinative effort, COCA is designed to first extract the coordinative components based on a partial guidance from knowledge genes and then rank the genes according to their participation strengths. An embedded bootstrapping procedure is implemented to improve statistical robustness of the solutions. COCA was initially tested on simulation data and then on published gene expression microarray data to demonstrate its improved performance as compared to traditional statistical methods. Finally, the COCA approach has been applied to stem cell data to identify biologically relevant genes in signaling pathways. As a result, the COCA approach uncovers novel pathway members that may shed light into the pathway deregulation in cancers.

Conclusion: We have developed a new integrative strategy to combine biological knowledge and microarray data for gene ranking. The method utilizes knowledge genes for a guidance to first extract coordinative components, and then rank the genes according to their contribution related to a network or pathway. The experimental results show that such a knowledge-guided strategy can provide context-specific gene ranking with an improved performance in pathway member identification.

Show MeSH

Related in: MedlinePlus

A flowchart of the proposed approach, namely knowledge-guided coordinative component analysis (COCA), for gene ranking. A bootstrapping procedure is designed to increase the confidence in estimating the coordinative component (W) and participation vector (A).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2865494&req=5

Figure 1: A flowchart of the proposed approach, namely knowledge-guided coordinative component analysis (COCA), for gene ranking. A bootstrapping procedure is designed to increase the confidence in estimating the coordinative component (W) and participation vector (A).

Mentions: A flowchart of the proposed approach is shown in Figure 1. Given a gene expression microarray data set, multiple data sets are first generated through bootstrap resampling of the genes in the array. The bootstrapping procedure is used to overcome the over-fitting problem associated with a small sample size relative to the very high dimensionality of the primary data [10,11]. Each bootstrap sampled data set is then analyzed by the proposed COCA algorithm. COCA aims to learn a coordinative direction by integrating biological knowledge and gene expression data, with which the knowledge is maximally aligned along the coordinative direction. The involvement of each gene in the knowledge network or pathway is estimated from a projection onto the coordinative direction. Finally, multiple bootstrapped estimates of the involvement are merged to create the gene ranking. Note that the COCA software package is made available at the following link: http://www.cbil.ece.vt.edu/software.htm.


Knowledge-guided gene ranking by coordinative component analysis.

Wang C, Xuan J, Li H, Wang Y, Zhan M, Hoffman EP, Clarke R - BMC Bioinformatics (2010)

A flowchart of the proposed approach, namely knowledge-guided coordinative component analysis (COCA), for gene ranking. A bootstrapping procedure is designed to increase the confidence in estimating the coordinative component (W) and participation vector (A).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2865494&req=5

Figure 1: A flowchart of the proposed approach, namely knowledge-guided coordinative component analysis (COCA), for gene ranking. A bootstrapping procedure is designed to increase the confidence in estimating the coordinative component (W) and participation vector (A).
Mentions: A flowchart of the proposed approach is shown in Figure 1. Given a gene expression microarray data set, multiple data sets are first generated through bootstrap resampling of the genes in the array. The bootstrapping procedure is used to overcome the over-fitting problem associated with a small sample size relative to the very high dimensionality of the primary data [10,11]. Each bootstrap sampled data set is then analyzed by the proposed COCA algorithm. COCA aims to learn a coordinative direction by integrating biological knowledge and gene expression data, with which the knowledge is maximally aligned along the coordinative direction. The involvement of each gene in the knowledge network or pathway is estimated from a projection onto the coordinative direction. Finally, multiple bootstrapped estimates of the involvement are merged to create the gene ranking. Note that the COCA software package is made available at the following link: http://www.cbil.ece.vt.edu/software.htm.

Bottom Line: Thus, it is important to prioritize those genes that are strongly associated with the functionality of a network.COCA was initially tested on simulation data and then on published gene expression microarray data to demonstrate its improved performance as compared to traditional statistical methods.The experimental results show that such a knowledge-guided strategy can provide context-specific gene ranking with an improved performance in pathway member identification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, USA.

ABSTRACT

Background: In cancer, gene networks and pathways often exhibit dynamic behavior, particularly during the process of carcinogenesis. Thus, it is important to prioritize those genes that are strongly associated with the functionality of a network. Traditional statistical methods are often inept to identify biologically relevant member genes, motivating researchers to incorporate biological knowledge into gene ranking methods. However, current integration strategies are often heuristic and fail to incorporate fully the true interplay between biological knowledge and gene expression data.

Results: To improve knowledge-guided gene ranking, we propose a novel method called coordinative component analysis (COCA) in this paper. COCA explicitly captures those genes within a specific biological context that are likely to be expressed in a coordinative manner. Formulated as an optimization problem to maximize the coordinative effort, COCA is designed to first extract the coordinative components based on a partial guidance from knowledge genes and then rank the genes according to their participation strengths. An embedded bootstrapping procedure is implemented to improve statistical robustness of the solutions. COCA was initially tested on simulation data and then on published gene expression microarray data to demonstrate its improved performance as compared to traditional statistical methods. Finally, the COCA approach has been applied to stem cell data to identify biologically relevant genes in signaling pathways. As a result, the COCA approach uncovers novel pathway members that may shed light into the pathway deregulation in cancers.

Conclusion: We have developed a new integrative strategy to combine biological knowledge and microarray data for gene ranking. The method utilizes knowledge genes for a guidance to first extract coordinative components, and then rank the genes according to their contribution related to a network or pathway. The experimental results show that such a knowledge-guided strategy can provide context-specific gene ranking with an improved performance in pathway member identification.

Show MeSH
Related in: MedlinePlus