Limits...
Integrative analysis of survival-associated gene sets in breast cancer.

Varn FS, Ung MH, Lou SK, Cheng C - BMC Med Genomics (2015)

Bottom Line: Using the GSAS metric, we identified 120 gene sets that were significantly associated with patient survival in all datasets tested.Most interestingly, removal of the genes in this gene set from the gene pool on MSigDB resulted in a large reduction in the number of predictive gene sets, suggesting a prominent role for these genes in breast cancer progression.We used this metric to identify predictive gene sets and to construct a novel gene set containing genes heavily involved in cancer progression.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, 03755, USA. Frederick.S.Varn.Jr.GR@dartmouth.edu.

ABSTRACT

Background: Patient gene expression information has recently become a clinical feature used to evaluate breast cancer prognosis. The emergence of prognostic gene sets that take advantage of these data has led to a rich library of information that can be used to characterize the molecular nature of a patient's cancer. Identifying robust gene sets that are consistently predictive of a patient's clinical outcome has become one of the main challenges in the field.

Methods: We inputted our previously established BASE algorithm with patient gene expression data and gene sets from MSigDB to develop the gene set activity score (GSAS), a metric that quantitatively assesses a gene set's activity level in a given patient. We utilized this metric, along with patient time-to-event data, to perform survival analyses to identify the gene sets that were significantly correlated with patient survival. We then performed cross-dataset analyses to identify robust prognostic gene sets and to classify patients by metastasis status. Additionally, we created a gene set network based on component gene overlap to explore the relationship between gene sets derived from MSigDB. We developed a novel gene set based on this network's topology and applied the GSAS metric to characterize its role in patient survival.

Results: Using the GSAS metric, we identified 120 gene sets that were significantly associated with patient survival in all datasets tested. The gene overlap network analysis yielded a novel gene set enriched in genes shared by the robustly predictive gene sets. This gene set was highly correlated to patient survival when used alone. Most interestingly, removal of the genes in this gene set from the gene pool on MSigDB resulted in a large reduction in the number of predictive gene sets, suggesting a prominent role for these genes in breast cancer progression.

Conclusions: The GSAS metric provided a useful medium by which we systematically investigated how gene sets from MSigDB relate to breast cancer patient survival. We used this metric to identify predictive gene sets and to construct a novel gene set containing genes heavily involved in cancer progression.

No MeSH data available.


Related in: MedlinePlus

A gene set network reveals a module highly enriched in cell proliferation genes. Gene sets significantly associated with survival (FDR < 0.01) in the van de Vijver dataset were selected and dichotomized based on having a negative effect hazard ratio (hr ≥ 1.00) or a positive effect hazard ratio (hr < 1.00). A network was then created from the two groups of genes linking the gene sets (nodes) by the number of genes they had in common (edges). This analysis revealed a module (solid box) made up of the robust gene sets whose genes were enriched in cell proliferative-based functions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4359519&req=5

Fig6: A gene set network reveals a module highly enriched in cell proliferation genes. Gene sets significantly associated with survival (FDR < 0.01) in the van de Vijver dataset were selected and dichotomized based on having a negative effect hazard ratio (hr ≥ 1.00) or a positive effect hazard ratio (hr < 1.00). A network was then created from the two groups of genes linking the gene sets (nodes) by the number of genes they had in common (edges). This analysis revealed a module (solid box) made up of the robust gene sets whose genes were enriched in cell proliferative-based functions.

Mentions: Our success in using GSASs to characterize the gene sets of MSigDB led us to seek ways to streamline our analyses further. We reasoned that the GSASs from a gene set may be correlated with patient survival due to a few strong survival-associated genes rather than the overall behavior of the gene set. To test this, we investigated the shared-gene relationship among the gene sets, as genetic similarity between gene sets may imply comparable prognostic association. We applied a network-based approach (see Methods), to visualize this relationship between survival-associated gene sets, which would allow us to better identify collections of gene sets that have many genes in common (Figure 6). Briefly, each node in the network represented a gene set that was associated with survival in the van de Vijver dataset (FDR < 0.01), with an edge present between gene sets if the number of genes the two gene sets shared divided by the number of genes in the union of the gene sets was > 0.20. We identified a module in the network that was enriched in the robust gene sets we had described earlier (solid box, Figure 6), which indicated that many of these gene sets have high gene overlap. To characterize the prognostic contribution of these shared genes we selected for the genes that were present in at least 40% of the module gene sets to create what we termed the module’s core gene set (Additional file 4).Figure 6


Integrative analysis of survival-associated gene sets in breast cancer.

Varn FS, Ung MH, Lou SK, Cheng C - BMC Med Genomics (2015)

A gene set network reveals a module highly enriched in cell proliferation genes. Gene sets significantly associated with survival (FDR < 0.01) in the van de Vijver dataset were selected and dichotomized based on having a negative effect hazard ratio (hr ≥ 1.00) or a positive effect hazard ratio (hr < 1.00). A network was then created from the two groups of genes linking the gene sets (nodes) by the number of genes they had in common (edges). This analysis revealed a module (solid box) made up of the robust gene sets whose genes were enriched in cell proliferative-based functions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4359519&req=5

Fig6: A gene set network reveals a module highly enriched in cell proliferation genes. Gene sets significantly associated with survival (FDR < 0.01) in the van de Vijver dataset were selected and dichotomized based on having a negative effect hazard ratio (hr ≥ 1.00) or a positive effect hazard ratio (hr < 1.00). A network was then created from the two groups of genes linking the gene sets (nodes) by the number of genes they had in common (edges). This analysis revealed a module (solid box) made up of the robust gene sets whose genes were enriched in cell proliferative-based functions.
Mentions: Our success in using GSASs to characterize the gene sets of MSigDB led us to seek ways to streamline our analyses further. We reasoned that the GSASs from a gene set may be correlated with patient survival due to a few strong survival-associated genes rather than the overall behavior of the gene set. To test this, we investigated the shared-gene relationship among the gene sets, as genetic similarity between gene sets may imply comparable prognostic association. We applied a network-based approach (see Methods), to visualize this relationship between survival-associated gene sets, which would allow us to better identify collections of gene sets that have many genes in common (Figure 6). Briefly, each node in the network represented a gene set that was associated with survival in the van de Vijver dataset (FDR < 0.01), with an edge present between gene sets if the number of genes the two gene sets shared divided by the number of genes in the union of the gene sets was > 0.20. We identified a module in the network that was enriched in the robust gene sets we had described earlier (solid box, Figure 6), which indicated that many of these gene sets have high gene overlap. To characterize the prognostic contribution of these shared genes we selected for the genes that were present in at least 40% of the module gene sets to create what we termed the module’s core gene set (Additional file 4).Figure 6

Bottom Line: Using the GSAS metric, we identified 120 gene sets that were significantly associated with patient survival in all datasets tested.Most interestingly, removal of the genes in this gene set from the gene pool on MSigDB resulted in a large reduction in the number of predictive gene sets, suggesting a prominent role for these genes in breast cancer progression.We used this metric to identify predictive gene sets and to construct a novel gene set containing genes heavily involved in cancer progression.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, 03755, USA. Frederick.S.Varn.Jr.GR@dartmouth.edu.

ABSTRACT

Background: Patient gene expression information has recently become a clinical feature used to evaluate breast cancer prognosis. The emergence of prognostic gene sets that take advantage of these data has led to a rich library of information that can be used to characterize the molecular nature of a patient's cancer. Identifying robust gene sets that are consistently predictive of a patient's clinical outcome has become one of the main challenges in the field.

Methods: We inputted our previously established BASE algorithm with patient gene expression data and gene sets from MSigDB to develop the gene set activity score (GSAS), a metric that quantitatively assesses a gene set's activity level in a given patient. We utilized this metric, along with patient time-to-event data, to perform survival analyses to identify the gene sets that were significantly correlated with patient survival. We then performed cross-dataset analyses to identify robust prognostic gene sets and to classify patients by metastasis status. Additionally, we created a gene set network based on component gene overlap to explore the relationship between gene sets derived from MSigDB. We developed a novel gene set based on this network's topology and applied the GSAS metric to characterize its role in patient survival.

Results: Using the GSAS metric, we identified 120 gene sets that were significantly associated with patient survival in all datasets tested. The gene overlap network analysis yielded a novel gene set enriched in genes shared by the robustly predictive gene sets. This gene set was highly correlated to patient survival when used alone. Most interestingly, removal of the genes in this gene set from the gene pool on MSigDB resulted in a large reduction in the number of predictive gene sets, suggesting a prominent role for these genes in breast cancer progression.

Conclusions: The GSAS metric provided a useful medium by which we systematically investigated how gene sets from MSigDB relate to breast cancer patient survival. We used this metric to identify predictive gene sets and to construct a novel gene set containing genes heavily involved in cancer progression.

No MeSH data available.


Related in: MedlinePlus