Limits...
Gene and pathway identification with Lp penalized Bayesian logistic regression.

Liu Z, Gartenhaus RB, Tan M, Jiang F, Jiao X - BMC Bioinformatics (2008)

Bottom Line: It has been demonstrated that the magnitude of differential expression does not necessarily indicate biological significance.The proposed algorithm (BLpLog) is typically two or three orders of magnitude faster than the original algorithm and free from bias in performance estimation.We also define a novel similarity measure and develop an integrated algorithm to hunt the regulatory genes with low expression changes but having high correlation with the selected genes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Biostatistics, University of Maryland Greenebaum Cancer Center, 22 South Greene Street, Baltimore, MD 21201, USA. zliu@umm.edu

ABSTRACT

Background: Identifying genes and pathways associated with diseases such as cancer has been a subject of considerable research in recent years in the area of bioinformatics and computational biology. It has been demonstrated that the magnitude of differential expression does not necessarily indicate biological significance. Even a very small change in the expression of particular gene may have dramatic physiological consequences if the protein encoded by this gene plays a catalytic role in a specific cell function. Moreover, highly correlated genes may function together on the same pathway biologically. Finally, in sparse logistic regression with Lp (p < 1) penalty, the degree of the sparsity obtained is determined by the value of the regularization parameter. Usually this parameter must be carefully tuned through cross-validation, which is time consuming.

Results: In this paper, we proposed a simple Bayesian approach to integrate the regularization parameter out analytically using a new prior. Therefore, there is no longer a need for parameter selection, as it is eliminated entirely from the model. The proposed algorithm (BLpLog) is typically two or three orders of magnitude faster than the original algorithm and free from bias in performance estimation. We also define a novel similarity measure and develop an integrated algorithm to hunt the regulatory genes with low expression changes but having high correlation with the selected genes. Pathways of those correlated genes were identified with DAVID http://david.abcc.ncifcrf.gov/.

Conclusion: Experimental results with gene expression data demonstrate that the proposed methods can be utilized to identify important genes and pathways that are related to cancer and build a parsimonious model for future patient predictions.

Show MeSH

Related in: MedlinePlus

Axon Guidance. Axon guidance pathway and the associated genes. The over-expressed and down-expressed genes on the pathway are shown in red and blue respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2567351&req=5

Figure 4: Axon Guidance. Axon guidance pathway and the associated genes. The over-expressed and down-expressed genes on the pathway are shown in red and blue respectively.

Mentions: BLpLog identifies 8 highly differentiated genes with the test AUC = 0.93 with the computational time of 3.2 minutes. BLogReg selects 13 genes with the predicted AUC = 0.90 and computational time of 1.6 minutes. SparseLOGREG identifies 10 genes with the predicted AUC = 0.936 and computational time of 127 minutes (2 hours). The selected genes with different methods are not completely the same but highly correlated. Again with the minimal difference in performance and big differences in computational time between BLpLog and SparseLOGREG, obviously BLpLog is preferred. The highly correlated genes and corresponding pathways are selected with the integrated algorithm. Table 3 and Table 4 are the computational results. Seven pathways was identified from the data. The plots of Antigen Processing and Presentation and Axon Guidance pathways are shown in Figure 3 – Figure 4. All eight pathways identified are important in hepatocellular carcinoma and other cancers. For example, PURINE METABOLISM pathway is one of the metabolism pathways involved in nucleotide synthesis and degradation, amino acid catabolism, non-essential amino acid synthesis and the urea cycle. Understanding the mechanism involved in metabolic regulation has important implications in both biotechnology and medicine. It is estimated that at least a third of all serious health problems are caused by metabolic disorders. Analyzing differentiated expressed genes on the pathway may provide some insight on the early intrahepatic recurrence of hepatocellular carcinoma after curative resection. Other pathways such as CYTOKINE-CYTOKINE RECEPTOR INTERACTION, NEUROACTIVE LIGAND-RECEPTOR INTERACTION, MAPK SIGNALING PATHWAY, and GAP JUNCTION are all hepatocellular carcinoma related. We will discuss the two pathways ANTIGEN PROCESSING AND PRESENTATION and AXON GUIDANCE in more details.


Gene and pathway identification with Lp penalized Bayesian logistic regression.

Liu Z, Gartenhaus RB, Tan M, Jiang F, Jiao X - BMC Bioinformatics (2008)

Axon Guidance. Axon guidance pathway and the associated genes. The over-expressed and down-expressed genes on the pathway are shown in red and blue respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2567351&req=5

Figure 4: Axon Guidance. Axon guidance pathway and the associated genes. The over-expressed and down-expressed genes on the pathway are shown in red and blue respectively.
Mentions: BLpLog identifies 8 highly differentiated genes with the test AUC = 0.93 with the computational time of 3.2 minutes. BLogReg selects 13 genes with the predicted AUC = 0.90 and computational time of 1.6 minutes. SparseLOGREG identifies 10 genes with the predicted AUC = 0.936 and computational time of 127 minutes (2 hours). The selected genes with different methods are not completely the same but highly correlated. Again with the minimal difference in performance and big differences in computational time between BLpLog and SparseLOGREG, obviously BLpLog is preferred. The highly correlated genes and corresponding pathways are selected with the integrated algorithm. Table 3 and Table 4 are the computational results. Seven pathways was identified from the data. The plots of Antigen Processing and Presentation and Axon Guidance pathways are shown in Figure 3 – Figure 4. All eight pathways identified are important in hepatocellular carcinoma and other cancers. For example, PURINE METABOLISM pathway is one of the metabolism pathways involved in nucleotide synthesis and degradation, amino acid catabolism, non-essential amino acid synthesis and the urea cycle. Understanding the mechanism involved in metabolic regulation has important implications in both biotechnology and medicine. It is estimated that at least a third of all serious health problems are caused by metabolic disorders. Analyzing differentiated expressed genes on the pathway may provide some insight on the early intrahepatic recurrence of hepatocellular carcinoma after curative resection. Other pathways such as CYTOKINE-CYTOKINE RECEPTOR INTERACTION, NEUROACTIVE LIGAND-RECEPTOR INTERACTION, MAPK SIGNALING PATHWAY, and GAP JUNCTION are all hepatocellular carcinoma related. We will discuss the two pathways ANTIGEN PROCESSING AND PRESENTATION and AXON GUIDANCE in more details.

Bottom Line: It has been demonstrated that the magnitude of differential expression does not necessarily indicate biological significance.The proposed algorithm (BLpLog) is typically two or three orders of magnitude faster than the original algorithm and free from bias in performance estimation.We also define a novel similarity measure and develop an integrated algorithm to hunt the regulatory genes with low expression changes but having high correlation with the selected genes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Biostatistics, University of Maryland Greenebaum Cancer Center, 22 South Greene Street, Baltimore, MD 21201, USA. zliu@umm.edu

ABSTRACT

Background: Identifying genes and pathways associated with diseases such as cancer has been a subject of considerable research in recent years in the area of bioinformatics and computational biology. It has been demonstrated that the magnitude of differential expression does not necessarily indicate biological significance. Even a very small change in the expression of particular gene may have dramatic physiological consequences if the protein encoded by this gene plays a catalytic role in a specific cell function. Moreover, highly correlated genes may function together on the same pathway biologically. Finally, in sparse logistic regression with Lp (p < 1) penalty, the degree of the sparsity obtained is determined by the value of the regularization parameter. Usually this parameter must be carefully tuned through cross-validation, which is time consuming.

Results: In this paper, we proposed a simple Bayesian approach to integrate the regularization parameter out analytically using a new prior. Therefore, there is no longer a need for parameter selection, as it is eliminated entirely from the model. The proposed algorithm (BLpLog) is typically two or three orders of magnitude faster than the original algorithm and free from bias in performance estimation. We also define a novel similarity measure and develop an integrated algorithm to hunt the regulatory genes with low expression changes but having high correlation with the selected genes. Pathways of those correlated genes were identified with DAVID http://david.abcc.ncifcrf.gov/.

Conclusion: Experimental results with gene expression data demonstrate that the proposed methods can be utilized to identify important genes and pathways that are related to cancer and build a parsimonious model for future patient predictions.

Show MeSH
Related in: MedlinePlus