Limits...
Gene and pathway identification with Lp penalized Bayesian logistic regression.

Liu Z, Gartenhaus RB, Tan M, Jiang F, Jiao X - BMC Bioinformatics (2008)

Bottom Line: It has been demonstrated that the magnitude of differential expression does not necessarily indicate biological significance.The proposed algorithm (BLpLog) is typically two or three orders of magnitude faster than the original algorithm and free from bias in performance estimation.We also define a novel similarity measure and develop an integrated algorithm to hunt the regulatory genes with low expression changes but having high correlation with the selected genes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Biostatistics, University of Maryland Greenebaum Cancer Center, 22 South Greene Street, Baltimore, MD 21201, USA. zliu@umm.edu

ABSTRACT

Background: Identifying genes and pathways associated with diseases such as cancer has been a subject of considerable research in recent years in the area of bioinformatics and computational biology. It has been demonstrated that the magnitude of differential expression does not necessarily indicate biological significance. Even a very small change in the expression of particular gene may have dramatic physiological consequences if the protein encoded by this gene plays a catalytic role in a specific cell function. Moreover, highly correlated genes may function together on the same pathway biologically. Finally, in sparse logistic regression with Lp (p < 1) penalty, the degree of the sparsity obtained is determined by the value of the regularization parameter. Usually this parameter must be carefully tuned through cross-validation, which is time consuming.

Results: In this paper, we proposed a simple Bayesian approach to integrate the regularization parameter out analytically using a new prior. Therefore, there is no longer a need for parameter selection, as it is eliminated entirely from the model. The proposed algorithm (BLpLog) is typically two or three orders of magnitude faster than the original algorithm and free from bias in performance estimation. We also define a novel similarity measure and develop an integrated algorithm to hunt the regulatory genes with low expression changes but having high correlation with the selected genes. Pathways of those correlated genes were identified with DAVID http://david.abcc.ncifcrf.gov/.

Conclusion: Experimental results with gene expression data demonstrate that the proposed methods can be utilized to identify important genes and pathways that are related to cancer and build a parsimonious model for future patient predictions.

Show MeSH

Related in: MedlinePlus

FOCAL ADHESION. FOCAL ADHESION pathway and the associated genes. The over-expressed and down-expressed genes on the pathway are shown in red and blue respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2567351&req=5

Figure 5: FOCAL ADHESION. FOCAL ADHESION pathway and the associated genes. The over-expressed and down-expressed genes on the pathway are shown in red and blue respectively.

Mentions: BLpLog has identified 14 genes that are highly differentiated expressed in glioblastomas and anaplastic oligodendrogliomas (predicted AUC = 0.98). Eight pathways and associated correlated genes are identified with the integrated algorithm. The computational results are given in Table 5 and 6. The plots of FOCAL ADHESION and RIBOSOME pathways are shown in Figure 5 – Figure 6. The eight identified pathways are important in malignant gliomas and other diseases. For instance, COMPLEMENT AND COAGULATION CASCADES are composed of serine proteases that are activated through partial cleavage by an upstream enzyme. The elements of these cascades share several common structural characteristics, including a highly conserved catalytic site composed of Ser, His and Asp. The common principle underlying the organization of these systems is that proteases exist as inactive zymogens and are subsequently activated by upstream, active proteases. The initial activation might occur as a result of contact with a non-enzymatic ligand or cleavage by another protease. Understanding the interplay between complement and coagulation has fundamental clinical implications in the context of cancers with an inflammatory pathogenesis. Migration and invasion are important prerequisites for the infiltrative and destructive growth patterns of malignant gliomas. The glioma cell invasiveness depends on proteases of the coagulation and complement cascades. Another pathway, GLUTATHIONE METABOLISM, works through the operation of a group of enzymes called glutathione S-transferases (GST). Glutathione (GSH) plays a critical role in cellular mechanisms that result in cell death. The high glutathione levels may cause resistance to chemotherapy drugs. One interesting study by researchers in Texas showed that your chances of surviving a type of brain cancer, called primary malignant glioma, could depend on the type of glutathione-s-transferase (GST) gene you were born with. Therefore, it is possible to target glutathione metabolism in the prevention and treatment of malignant gliomas. Here we discuss the FOCAL ADHESION and RIBOSOME pathways in more details.


Gene and pathway identification with Lp penalized Bayesian logistic regression.

Liu Z, Gartenhaus RB, Tan M, Jiang F, Jiao X - BMC Bioinformatics (2008)

FOCAL ADHESION. FOCAL ADHESION pathway and the associated genes. The over-expressed and down-expressed genes on the pathway are shown in red and blue respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2567351&req=5

Figure 5: FOCAL ADHESION. FOCAL ADHESION pathway and the associated genes. The over-expressed and down-expressed genes on the pathway are shown in red and blue respectively.
Mentions: BLpLog has identified 14 genes that are highly differentiated expressed in glioblastomas and anaplastic oligodendrogliomas (predicted AUC = 0.98). Eight pathways and associated correlated genes are identified with the integrated algorithm. The computational results are given in Table 5 and 6. The plots of FOCAL ADHESION and RIBOSOME pathways are shown in Figure 5 – Figure 6. The eight identified pathways are important in malignant gliomas and other diseases. For instance, COMPLEMENT AND COAGULATION CASCADES are composed of serine proteases that are activated through partial cleavage by an upstream enzyme. The elements of these cascades share several common structural characteristics, including a highly conserved catalytic site composed of Ser, His and Asp. The common principle underlying the organization of these systems is that proteases exist as inactive zymogens and are subsequently activated by upstream, active proteases. The initial activation might occur as a result of contact with a non-enzymatic ligand or cleavage by another protease. Understanding the interplay between complement and coagulation has fundamental clinical implications in the context of cancers with an inflammatory pathogenesis. Migration and invasion are important prerequisites for the infiltrative and destructive growth patterns of malignant gliomas. The glioma cell invasiveness depends on proteases of the coagulation and complement cascades. Another pathway, GLUTATHIONE METABOLISM, works through the operation of a group of enzymes called glutathione S-transferases (GST). Glutathione (GSH) plays a critical role in cellular mechanisms that result in cell death. The high glutathione levels may cause resistance to chemotherapy drugs. One interesting study by researchers in Texas showed that your chances of surviving a type of brain cancer, called primary malignant glioma, could depend on the type of glutathione-s-transferase (GST) gene you were born with. Therefore, it is possible to target glutathione metabolism in the prevention and treatment of malignant gliomas. Here we discuss the FOCAL ADHESION and RIBOSOME pathways in more details.

Bottom Line: It has been demonstrated that the magnitude of differential expression does not necessarily indicate biological significance.The proposed algorithm (BLpLog) is typically two or three orders of magnitude faster than the original algorithm and free from bias in performance estimation.We also define a novel similarity measure and develop an integrated algorithm to hunt the regulatory genes with low expression changes but having high correlation with the selected genes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Biostatistics, University of Maryland Greenebaum Cancer Center, 22 South Greene Street, Baltimore, MD 21201, USA. zliu@umm.edu

ABSTRACT

Background: Identifying genes and pathways associated with diseases such as cancer has been a subject of considerable research in recent years in the area of bioinformatics and computational biology. It has been demonstrated that the magnitude of differential expression does not necessarily indicate biological significance. Even a very small change in the expression of particular gene may have dramatic physiological consequences if the protein encoded by this gene plays a catalytic role in a specific cell function. Moreover, highly correlated genes may function together on the same pathway biologically. Finally, in sparse logistic regression with Lp (p < 1) penalty, the degree of the sparsity obtained is determined by the value of the regularization parameter. Usually this parameter must be carefully tuned through cross-validation, which is time consuming.

Results: In this paper, we proposed a simple Bayesian approach to integrate the regularization parameter out analytically using a new prior. Therefore, there is no longer a need for parameter selection, as it is eliminated entirely from the model. The proposed algorithm (BLpLog) is typically two or three orders of magnitude faster than the original algorithm and free from bias in performance estimation. We also define a novel similarity measure and develop an integrated algorithm to hunt the regulatory genes with low expression changes but having high correlation with the selected genes. Pathways of those correlated genes were identified with DAVID http://david.abcc.ncifcrf.gov/.

Conclusion: Experimental results with gene expression data demonstrate that the proposed methods can be utilized to identify important genes and pathways that are related to cancer and build a parsimonious model for future patient predictions.

Show MeSH
Related in: MedlinePlus