Limits...
Prognostic gene signature identification using causal structure learning: applications in kidney cancer.

Ha MJ, Baladandayuthapani V, Do KA - Cancer Inform (2015)

Bottom Line: The causal structures are represented by directed acyclic graphs (DAGs), wherein we construct gene-specific network modules that constitute a gene and its corresponding regulators.The modules are then subsequently used to correlate with survival times, thus, allowing for a network-oriented approach to gene selection to adjust for potential confounders, as opposed to univariate (gene-by-gene) approaches.Our methods are motivated by and applied to a clear cell renal cell carcinoma (ccRCC) study from The Cancer Genome Atlas (TCGA) where we find several prognostic genes associated with cancer progression - some of which are novel while others confirm existing findings.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, The University of Texas, MD Anderson Cancer Center, Houston, TX, USA.

ABSTRACT
Identification of molecular-based signatures is one of the critical steps toward finding therapeutic targets in cancer. In this paper, we propose methods to discover prognostic gene signatures under a causal structure learning framework across the whole genome. The causal structures are represented by directed acyclic graphs (DAGs), wherein we construct gene-specific network modules that constitute a gene and its corresponding regulators. The modules are then subsequently used to correlate with survival times, thus, allowing for a network-oriented approach to gene selection to adjust for potential confounders, as opposed to univariate (gene-by-gene) approaches. Our methods are motivated by and applied to a clear cell renal cell carcinoma (ccRCC) study from The Cancer Genome Atlas (TCGA) where we find several prognostic genes associated with cancer progression - some of which are novel while others confirm existing findings.

No MeSH data available.


Related in: MedlinePlus

Workflow to obtain the whole genome causal structure: pairs of genes (edges) are sequentially excluded by conditional (marginal) independence tests, starting from a completely connected graph and arriving at a skeleton. V-structure detection and completion steps then follow. PDAG is partially directed acyclic graph and CPDAG is completed PDAG.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4362630&req=5

f1-cin-suppl.1-2015-023: Workflow to obtain the whole genome causal structure: pairs of genes (edges) are sequentially excluded by conditional (marginal) independence tests, starting from a completely connected graph and arriving at a skeleton. V-structure detection and completion steps then follow. PDAG is partially directed acyclic graph and CPDAG is completed PDAG.

Mentions: We propose an approach for estimating the effect of each gene on patient survival, adjusted for the causal structure of all the genes of interest. The causal structure forms modules for each gene that consists of a gene and its parents – where parents are defined by the set of genes having a directed edge (pointing) toward a gene in a graph. The main challenge is that the unique determination of modules is unidentifiable from observational gene expression data. To address this issue, we propose a principled statistical procedure that consists of two main parts: (1) estimating the causal structure, which includes direct and indirect relations among genes, for high-dimensional gene expression data and (2) evaluating the effects of each gene under the ambiguous causal structure. Figure 1 concisely describes the entire workflow of our method. Briefly, the causal structure is first estimated through several undirected/partially directed graphs from Steps 1 to 6 and the edges are sequentially thinned with different implications of the dependencies for edges in different graphs. The eventual causal structure represented by the completed, partially directed acyclic graph (CPDAG) in Step 6 includes undirected edges when the directions are not identifiable. To address the issue of identifiability, several effect sizes of each gene for all possible modules from the CPDAG are obtained by the Cox-proportional hazards model, and the minimum effect size is used for ranking the genes. As opposed to the single gene analysis, we refine the estimation of effect size based on the estimated causal structure and show how this leads to better quantification of the prognostic effects. In the following subsections, we describe in detail our method for the causal structure estimation and the effect size evaluation given the causal structure.


Prognostic gene signature identification using causal structure learning: applications in kidney cancer.

Ha MJ, Baladandayuthapani V, Do KA - Cancer Inform (2015)

Workflow to obtain the whole genome causal structure: pairs of genes (edges) are sequentially excluded by conditional (marginal) independence tests, starting from a completely connected graph and arriving at a skeleton. V-structure detection and completion steps then follow. PDAG is partially directed acyclic graph and CPDAG is completed PDAG.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4362630&req=5

f1-cin-suppl.1-2015-023: Workflow to obtain the whole genome causal structure: pairs of genes (edges) are sequentially excluded by conditional (marginal) independence tests, starting from a completely connected graph and arriving at a skeleton. V-structure detection and completion steps then follow. PDAG is partially directed acyclic graph and CPDAG is completed PDAG.
Mentions: We propose an approach for estimating the effect of each gene on patient survival, adjusted for the causal structure of all the genes of interest. The causal structure forms modules for each gene that consists of a gene and its parents – where parents are defined by the set of genes having a directed edge (pointing) toward a gene in a graph. The main challenge is that the unique determination of modules is unidentifiable from observational gene expression data. To address this issue, we propose a principled statistical procedure that consists of two main parts: (1) estimating the causal structure, which includes direct and indirect relations among genes, for high-dimensional gene expression data and (2) evaluating the effects of each gene under the ambiguous causal structure. Figure 1 concisely describes the entire workflow of our method. Briefly, the causal structure is first estimated through several undirected/partially directed graphs from Steps 1 to 6 and the edges are sequentially thinned with different implications of the dependencies for edges in different graphs. The eventual causal structure represented by the completed, partially directed acyclic graph (CPDAG) in Step 6 includes undirected edges when the directions are not identifiable. To address the issue of identifiability, several effect sizes of each gene for all possible modules from the CPDAG are obtained by the Cox-proportional hazards model, and the minimum effect size is used for ranking the genes. As opposed to the single gene analysis, we refine the estimation of effect size based on the estimated causal structure and show how this leads to better quantification of the prognostic effects. In the following subsections, we describe in detail our method for the causal structure estimation and the effect size evaluation given the causal structure.

Bottom Line: The causal structures are represented by directed acyclic graphs (DAGs), wherein we construct gene-specific network modules that constitute a gene and its corresponding regulators.The modules are then subsequently used to correlate with survival times, thus, allowing for a network-oriented approach to gene selection to adjust for potential confounders, as opposed to univariate (gene-by-gene) approaches.Our methods are motivated by and applied to a clear cell renal cell carcinoma (ccRCC) study from The Cancer Genome Atlas (TCGA) where we find several prognostic genes associated with cancer progression - some of which are novel while others confirm existing findings.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, The University of Texas, MD Anderson Cancer Center, Houston, TX, USA.

ABSTRACT
Identification of molecular-based signatures is one of the critical steps toward finding therapeutic targets in cancer. In this paper, we propose methods to discover prognostic gene signatures under a causal structure learning framework across the whole genome. The causal structures are represented by directed acyclic graphs (DAGs), wherein we construct gene-specific network modules that constitute a gene and its corresponding regulators. The modules are then subsequently used to correlate with survival times, thus, allowing for a network-oriented approach to gene selection to adjust for potential confounders, as opposed to univariate (gene-by-gene) approaches. Our methods are motivated by and applied to a clear cell renal cell carcinoma (ccRCC) study from The Cancer Genome Atlas (TCGA) where we find several prognostic genes associated with cancer progression - some of which are novel while others confirm existing findings.

No MeSH data available.


Related in: MedlinePlus