Limits...
Prognostic gene signature identification using causal structure learning: applications in kidney cancer.

Ha MJ, Baladandayuthapani V, Do KA - Cancer Inform (2015)

Bottom Line: The causal structures are represented by directed acyclic graphs (DAGs), wherein we construct gene-specific network modules that constitute a gene and its corresponding regulators.The modules are then subsequently used to correlate with survival times, thus, allowing for a network-oriented approach to gene selection to adjust for potential confounders, as opposed to univariate (gene-by-gene) approaches.Our methods are motivated by and applied to a clear cell renal cell carcinoma (ccRCC) study from The Cancer Genome Atlas (TCGA) where we find several prognostic genes associated with cancer progression - some of which are novel while others confirm existing findings.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, The University of Texas, MD Anderson Cancer Center, Houston, TX, USA.

ABSTRACT
Identification of molecular-based signatures is one of the critical steps toward finding therapeutic targets in cancer. In this paper, we propose methods to discover prognostic gene signatures under a causal structure learning framework across the whole genome. The causal structures are represented by directed acyclic graphs (DAGs), wherein we construct gene-specific network modules that constitute a gene and its corresponding regulators. The modules are then subsequently used to correlate with survival times, thus, allowing for a network-oriented approach to gene selection to adjust for potential confounders, as opposed to univariate (gene-by-gene) approaches. Our methods are motivated by and applied to a clear cell renal cell carcinoma (ccRCC) study from The Cancer Genome Atlas (TCGA) where we find several prognostic genes associated with cancer progression - some of which are novel while others confirm existing findings.

No MeSH data available.


Related in: MedlinePlus

Scatter plot of the effect sizes of all genes in relation to survival time, from the unadjusted model versus the network-adjusted model. The names of the top four genes from the network-adjusted model are listed on the graph. The green line represents equal effect sizes between the two models. The red dashed vertical and horizontal lines are drawn at the maximum in the absolute effects from the unadjusted model (0.35). The areas A1 and A2 indicate that effect sizes from the network-adjusted model are greater than the maximum in the effect sizes from the unadjusted model.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4362630&req=5

f4-cin-suppl.1-2015-023: Scatter plot of the effect sizes of all genes in relation to survival time, from the unadjusted model versus the network-adjusted model. The names of the top four genes from the network-adjusted model are listed on the graph. The green line represents equal effect sizes between the two models. The red dashed vertical and horizontal lines are drawn at the maximum in the absolute effects from the unadjusted model (0.35). The areas A1 and A2 indicate that effect sizes from the network-adjusted model are greater than the maximum in the effect sizes from the unadjusted model.

Mentions: Our gene ranking is based on the Cox-proportional hazards model, adjusted for the estimated CPDAG and four clinical covariates: patient age and tumor stage, grade, and metastasis status. We refer to this model as the network-adjusted model. To benchmark our method, we considered the model that includes the gene expression and the four clinical covariates with no parent gene and refer it as the unadjusted model. Therefore, in the network-adjusted model (which includes the set of gene parents for each gene), we have more parameters than the unadjusted model. Figure 4 displays the scatter plot of the effect sizes from the unadjusted model versus the network-adjusted model for all 14,576 genes. The slope of the regression line in the scatter plot was 0.893 with zero intercept. The trend with the slope less than 1 indicates that the effect sizes from the network-adjusted models are overall less than the effect sizes from the unadjusted model. However, the areas, A1 and A2, in Figure 4 indicate that the effect sizes from the network-adjusted model are greater than the maximum in the effect sizes from the unadjusted model. Although the effect sizes from the unadjusted model tend to be greater than those from the network-adjusted model, several genes show evident increases in their effect sizes by adding the set of parent genes (located in the regions A1 and A2). Especially, BAT3 gene showed the sign change in the effect sizes from the unadjusted model (0.04) to the network-adjusted model (−0.5) with 1,136.57% relative increase in their effect sizes by adding the parent genes, BAT2, FLOT1, and NOC2L (Table 1). Both BAT3 and BAT2 genes are HLA-B-associated transcripts, and the sequences of the two genes were shown to be closely linked.22EEF1A1 gene showed 1,311.37% relative increase in the effect sizes from the unadjusted model to the network-unadjusted model by adjusting for EEF1A1P9 gene (Table 1). The EEF1A1P9 gene is a pseudogene that is a dysfunctional gene with sequence similar to EEF1A1 gene, and has lost their protein-coding ability or is no longer expressed.23 Regulated by the pseudogene EEF1A1P9, EEF1A1 gene has a significant effect on the survival times. In summary, the top-ranked genes from the network-adjusted model are in the areas A1 and A2 in Figure 4, and this indicates that most of the top-ranked genes are not found in the unadjusted model.


Prognostic gene signature identification using causal structure learning: applications in kidney cancer.

Ha MJ, Baladandayuthapani V, Do KA - Cancer Inform (2015)

Scatter plot of the effect sizes of all genes in relation to survival time, from the unadjusted model versus the network-adjusted model. The names of the top four genes from the network-adjusted model are listed on the graph. The green line represents equal effect sizes between the two models. The red dashed vertical and horizontal lines are drawn at the maximum in the absolute effects from the unadjusted model (0.35). The areas A1 and A2 indicate that effect sizes from the network-adjusted model are greater than the maximum in the effect sizes from the unadjusted model.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4362630&req=5

f4-cin-suppl.1-2015-023: Scatter plot of the effect sizes of all genes in relation to survival time, from the unadjusted model versus the network-adjusted model. The names of the top four genes from the network-adjusted model are listed on the graph. The green line represents equal effect sizes between the two models. The red dashed vertical and horizontal lines are drawn at the maximum in the absolute effects from the unadjusted model (0.35). The areas A1 and A2 indicate that effect sizes from the network-adjusted model are greater than the maximum in the effect sizes from the unadjusted model.
Mentions: Our gene ranking is based on the Cox-proportional hazards model, adjusted for the estimated CPDAG and four clinical covariates: patient age and tumor stage, grade, and metastasis status. We refer to this model as the network-adjusted model. To benchmark our method, we considered the model that includes the gene expression and the four clinical covariates with no parent gene and refer it as the unadjusted model. Therefore, in the network-adjusted model (which includes the set of gene parents for each gene), we have more parameters than the unadjusted model. Figure 4 displays the scatter plot of the effect sizes from the unadjusted model versus the network-adjusted model for all 14,576 genes. The slope of the regression line in the scatter plot was 0.893 with zero intercept. The trend with the slope less than 1 indicates that the effect sizes from the network-adjusted models are overall less than the effect sizes from the unadjusted model. However, the areas, A1 and A2, in Figure 4 indicate that the effect sizes from the network-adjusted model are greater than the maximum in the effect sizes from the unadjusted model. Although the effect sizes from the unadjusted model tend to be greater than those from the network-adjusted model, several genes show evident increases in their effect sizes by adding the set of parent genes (located in the regions A1 and A2). Especially, BAT3 gene showed the sign change in the effect sizes from the unadjusted model (0.04) to the network-adjusted model (−0.5) with 1,136.57% relative increase in their effect sizes by adding the parent genes, BAT2, FLOT1, and NOC2L (Table 1). Both BAT3 and BAT2 genes are HLA-B-associated transcripts, and the sequences of the two genes were shown to be closely linked.22EEF1A1 gene showed 1,311.37% relative increase in the effect sizes from the unadjusted model to the network-unadjusted model by adjusting for EEF1A1P9 gene (Table 1). The EEF1A1P9 gene is a pseudogene that is a dysfunctional gene with sequence similar to EEF1A1 gene, and has lost their protein-coding ability or is no longer expressed.23 Regulated by the pseudogene EEF1A1P9, EEF1A1 gene has a significant effect on the survival times. In summary, the top-ranked genes from the network-adjusted model are in the areas A1 and A2 in Figure 4, and this indicates that most of the top-ranked genes are not found in the unadjusted model.

Bottom Line: The causal structures are represented by directed acyclic graphs (DAGs), wherein we construct gene-specific network modules that constitute a gene and its corresponding regulators.The modules are then subsequently used to correlate with survival times, thus, allowing for a network-oriented approach to gene selection to adjust for potential confounders, as opposed to univariate (gene-by-gene) approaches.Our methods are motivated by and applied to a clear cell renal cell carcinoma (ccRCC) study from The Cancer Genome Atlas (TCGA) where we find several prognostic genes associated with cancer progression - some of which are novel while others confirm existing findings.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, The University of Texas, MD Anderson Cancer Center, Houston, TX, USA.

ABSTRACT
Identification of molecular-based signatures is one of the critical steps toward finding therapeutic targets in cancer. In this paper, we propose methods to discover prognostic gene signatures under a causal structure learning framework across the whole genome. The causal structures are represented by directed acyclic graphs (DAGs), wherein we construct gene-specific network modules that constitute a gene and its corresponding regulators. The modules are then subsequently used to correlate with survival times, thus, allowing for a network-oriented approach to gene selection to adjust for potential confounders, as opposed to univariate (gene-by-gene) approaches. Our methods are motivated by and applied to a clear cell renal cell carcinoma (ccRCC) study from The Cancer Genome Atlas (TCGA) where we find several prognostic genes associated with cancer progression - some of which are novel while others confirm existing findings.

No MeSH data available.


Related in: MedlinePlus