Limits...
Integrative analysis of survival-associated gene sets in breast cancer.

Varn FS, Ung MH, Lou SK, Cheng C - BMC Med Genomics (2015)

Bottom Line: Using the GSAS metric, we identified 120 gene sets that were significantly associated with patient survival in all datasets tested.Most interestingly, removal of the genes in this gene set from the gene pool on MSigDB resulted in a large reduction in the number of predictive gene sets, suggesting a prominent role for these genes in breast cancer progression.We used this metric to identify predictive gene sets and to construct a novel gene set containing genes heavily involved in cancer progression.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, 03755, USA. Frederick.S.Varn.Jr.GR@dartmouth.edu.

ABSTRACT

Background: Patient gene expression information has recently become a clinical feature used to evaluate breast cancer prognosis. The emergence of prognostic gene sets that take advantage of these data has led to a rich library of information that can be used to characterize the molecular nature of a patient's cancer. Identifying robust gene sets that are consistently predictive of a patient's clinical outcome has become one of the main challenges in the field.

Methods: We inputted our previously established BASE algorithm with patient gene expression data and gene sets from MSigDB to develop the gene set activity score (GSAS), a metric that quantitatively assesses a gene set's activity level in a given patient. We utilized this metric, along with patient time-to-event data, to perform survival analyses to identify the gene sets that were significantly correlated with patient survival. We then performed cross-dataset analyses to identify robust prognostic gene sets and to classify patients by metastasis status. Additionally, we created a gene set network based on component gene overlap to explore the relationship between gene sets derived from MSigDB. We developed a novel gene set based on this network's topology and applied the GSAS metric to characterize its role in patient survival.

Results: Using the GSAS metric, we identified 120 gene sets that were significantly associated with patient survival in all datasets tested. The gene overlap network analysis yielded a novel gene set enriched in genes shared by the robustly predictive gene sets. This gene set was highly correlated to patient survival when used alone. Most interestingly, removal of the genes in this gene set from the gene pool on MSigDB resulted in a large reduction in the number of predictive gene sets, suggesting a prominent role for these genes in breast cancer progression.

Conclusions: The GSAS metric provided a useful medium by which we systematically investigated how gene sets from MSigDB relate to breast cancer patient survival. We used this metric to identify predictive gene sets and to construct a novel gene set containing genes heavily involved in cancer progression.

No MeSH data available.


Related in: MedlinePlus

The GSAS of VANTVEER_BREAST_CANCER_METASTASIS_DN predicts patient survival. (A) The distribution of genes from this gene set in an expression-ranked gene list in samples with a low (Sample X), intermediate (Sample Y), and high (Sample Z) GSAS. (B) The distribution of GSASs across all samples in a dataset. (C) Patients with positive GSASs (red curve) show significantly shorter survival times than those with negative GSASs (green curve). Vertical hash marks indicate points of censored data. (D) In a Cox PH model, this GSAS significantly predicts patient survival even after adjusting for traditional clinical features. A red dotted line indicates where the hazard ratio = 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4359519&req=5

Fig1: The GSAS of VANTVEER_BREAST_CANCER_METASTASIS_DN predicts patient survival. (A) The distribution of genes from this gene set in an expression-ranked gene list in samples with a low (Sample X), intermediate (Sample Y), and high (Sample Z) GSAS. (B) The distribution of GSASs across all samples in a dataset. (C) Patients with positive GSASs (red curve) show significantly shorter survival times than those with negative GSASs (green curve). Vertical hash marks indicate points of censored data. (D) In a Cox PH model, this GSAS significantly predicts patient survival even after adjusting for traditional clinical features. A red dotted line indicates where the hazard ratio = 1.

Mentions: Using the gene expression data provided by van de Vijver et al. [27], we calculated the gene set activity score (GSAS) for each gene set contained in the C2 curated gene set collection of MSigDB. Briefly, the GSAS for a sample is determined by calculating either the maximum positive or negative deviation between two empirical distribution functions. The foreground function is based on the position of the genes of a gene set in a list of genes rank-ordered by the gene expression values from the sample, while the background function is based on the position of the genes not found in the target gene set. Thus, a negative GSAS indicates low gene set activity, due to low relative expression levels of the component genes, while a positive GSAS indicates high gene set activity, due to high relative expression levels of the component genes. Figure 1A demonstrates the gene distribution of three samples exhibiting either a low (GSAS = −6), intermediate (GSAS = 1) or high (GSAS = 8) GSAS for a well-known breast cancer gene signature reported by van’t Veer et al. [2]. As expected, the genes in the sample with the low score cluster toward the left, while the genes in the sample with the high score cluster toward the right. Gene sets which have a GSAS around zero have component genes whose expression values are evenly distributed about zero. For each gene set this method was applied to, the resulting distribution of GSASs across all samples followed a bimodal distribution. This can be seen in the distribution of GSASs for the signature reported by van’t Veer et al. (Figure 1B).Figure 1


Integrative analysis of survival-associated gene sets in breast cancer.

Varn FS, Ung MH, Lou SK, Cheng C - BMC Med Genomics (2015)

The GSAS of VANTVEER_BREAST_CANCER_METASTASIS_DN predicts patient survival. (A) The distribution of genes from this gene set in an expression-ranked gene list in samples with a low (Sample X), intermediate (Sample Y), and high (Sample Z) GSAS. (B) The distribution of GSASs across all samples in a dataset. (C) Patients with positive GSASs (red curve) show significantly shorter survival times than those with negative GSASs (green curve). Vertical hash marks indicate points of censored data. (D) In a Cox PH model, this GSAS significantly predicts patient survival even after adjusting for traditional clinical features. A red dotted line indicates where the hazard ratio = 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4359519&req=5

Fig1: The GSAS of VANTVEER_BREAST_CANCER_METASTASIS_DN predicts patient survival. (A) The distribution of genes from this gene set in an expression-ranked gene list in samples with a low (Sample X), intermediate (Sample Y), and high (Sample Z) GSAS. (B) The distribution of GSASs across all samples in a dataset. (C) Patients with positive GSASs (red curve) show significantly shorter survival times than those with negative GSASs (green curve). Vertical hash marks indicate points of censored data. (D) In a Cox PH model, this GSAS significantly predicts patient survival even after adjusting for traditional clinical features. A red dotted line indicates where the hazard ratio = 1.
Mentions: Using the gene expression data provided by van de Vijver et al. [27], we calculated the gene set activity score (GSAS) for each gene set contained in the C2 curated gene set collection of MSigDB. Briefly, the GSAS for a sample is determined by calculating either the maximum positive or negative deviation between two empirical distribution functions. The foreground function is based on the position of the genes of a gene set in a list of genes rank-ordered by the gene expression values from the sample, while the background function is based on the position of the genes not found in the target gene set. Thus, a negative GSAS indicates low gene set activity, due to low relative expression levels of the component genes, while a positive GSAS indicates high gene set activity, due to high relative expression levels of the component genes. Figure 1A demonstrates the gene distribution of three samples exhibiting either a low (GSAS = −6), intermediate (GSAS = 1) or high (GSAS = 8) GSAS for a well-known breast cancer gene signature reported by van’t Veer et al. [2]. As expected, the genes in the sample with the low score cluster toward the left, while the genes in the sample with the high score cluster toward the right. Gene sets which have a GSAS around zero have component genes whose expression values are evenly distributed about zero. For each gene set this method was applied to, the resulting distribution of GSASs across all samples followed a bimodal distribution. This can be seen in the distribution of GSASs for the signature reported by van’t Veer et al. (Figure 1B).Figure 1

Bottom Line: Using the GSAS metric, we identified 120 gene sets that were significantly associated with patient survival in all datasets tested.Most interestingly, removal of the genes in this gene set from the gene pool on MSigDB resulted in a large reduction in the number of predictive gene sets, suggesting a prominent role for these genes in breast cancer progression.We used this metric to identify predictive gene sets and to construct a novel gene set containing genes heavily involved in cancer progression.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, 03755, USA. Frederick.S.Varn.Jr.GR@dartmouth.edu.

ABSTRACT

Background: Patient gene expression information has recently become a clinical feature used to evaluate breast cancer prognosis. The emergence of prognostic gene sets that take advantage of these data has led to a rich library of information that can be used to characterize the molecular nature of a patient's cancer. Identifying robust gene sets that are consistently predictive of a patient's clinical outcome has become one of the main challenges in the field.

Methods: We inputted our previously established BASE algorithm with patient gene expression data and gene sets from MSigDB to develop the gene set activity score (GSAS), a metric that quantitatively assesses a gene set's activity level in a given patient. We utilized this metric, along with patient time-to-event data, to perform survival analyses to identify the gene sets that were significantly correlated with patient survival. We then performed cross-dataset analyses to identify robust prognostic gene sets and to classify patients by metastasis status. Additionally, we created a gene set network based on component gene overlap to explore the relationship between gene sets derived from MSigDB. We developed a novel gene set based on this network's topology and applied the GSAS metric to characterize its role in patient survival.

Results: Using the GSAS metric, we identified 120 gene sets that were significantly associated with patient survival in all datasets tested. The gene overlap network analysis yielded a novel gene set enriched in genes shared by the robustly predictive gene sets. This gene set was highly correlated to patient survival when used alone. Most interestingly, removal of the genes in this gene set from the gene pool on MSigDB resulted in a large reduction in the number of predictive gene sets, suggesting a prominent role for these genes in breast cancer progression.

Conclusions: The GSAS metric provided a useful medium by which we systematically investigated how gene sets from MSigDB relate to breast cancer patient survival. We used this metric to identify predictive gene sets and to construct a novel gene set containing genes heavily involved in cancer progression.

No MeSH data available.


Related in: MedlinePlus