Limits...
Integrative analysis of survival-associated gene sets in breast cancer.

Varn FS, Ung MH, Lou SK, Cheng C - BMC Med Genomics (2015)

Bottom Line: Using the GSAS metric, we identified 120 gene sets that were significantly associated with patient survival in all datasets tested.Most interestingly, removal of the genes in this gene set from the gene pool on MSigDB resulted in a large reduction in the number of predictive gene sets, suggesting a prominent role for these genes in breast cancer progression.We used this metric to identify predictive gene sets and to construct a novel gene set containing genes heavily involved in cancer progression.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, 03755, USA. Frederick.S.Varn.Jr.GR@dartmouth.edu.

ABSTRACT

Background: Patient gene expression information has recently become a clinical feature used to evaluate breast cancer prognosis. The emergence of prognostic gene sets that take advantage of these data has led to a rich library of information that can be used to characterize the molecular nature of a patient's cancer. Identifying robust gene sets that are consistently predictive of a patient's clinical outcome has become one of the main challenges in the field.

Methods: We inputted our previously established BASE algorithm with patient gene expression data and gene sets from MSigDB to develop the gene set activity score (GSAS), a metric that quantitatively assesses a gene set's activity level in a given patient. We utilized this metric, along with patient time-to-event data, to perform survival analyses to identify the gene sets that were significantly correlated with patient survival. We then performed cross-dataset analyses to identify robust prognostic gene sets and to classify patients by metastasis status. Additionally, we created a gene set network based on component gene overlap to explore the relationship between gene sets derived from MSigDB. We developed a novel gene set based on this network's topology and applied the GSAS metric to characterize its role in patient survival.

Results: Using the GSAS metric, we identified 120 gene sets that were significantly associated with patient survival in all datasets tested. The gene overlap network analysis yielded a novel gene set enriched in genes shared by the robustly predictive gene sets. This gene set was highly correlated to patient survival when used alone. Most interestingly, removal of the genes in this gene set from the gene pool on MSigDB resulted in a large reduction in the number of predictive gene sets, suggesting a prominent role for these genes in breast cancer progression.

Conclusions: The GSAS metric provided a useful medium by which we systematically investigated how gene sets from MSigDB relate to breast cancer patient survival. We used this metric to identify predictive gene sets and to construct a novel gene set containing genes heavily involved in cancer progression.

No MeSH data available.


Related in: MedlinePlus

Hierarchical clustering of samples using the GSASs for the 120 robust gene sets. (A) A heatmap showing the pattern of GSASs for the 120 robust gene sets across the 295 samples in the van de Vijver dataset. Each row represents one sample’s GSAS profile for each of the 120 robust gene sets and each column represents the GSASs across all samples for one of the robust gene sets. To show contrast, all GSASs less than −3 or greater than 3 were adjusted to −3 and 3, respectively. Green is indicative of a lower (more negative) GSAS for a sample while red is indicative of a higher (more positive) GSAS for a sample. (B) Hierarchical clustering of the samples based on GSAS in the robust gene sets reveals two distinct groups of samples. The red group is enriched in samples with ER- breast cancer and distant metastasis occurrence compared to the green group.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4359519&req=5

Fig4: Hierarchical clustering of samples using the GSASs for the 120 robust gene sets. (A) A heatmap showing the pattern of GSASs for the 120 robust gene sets across the 295 samples in the van de Vijver dataset. Each row represents one sample’s GSAS profile for each of the 120 robust gene sets and each column represents the GSASs across all samples for one of the robust gene sets. To show contrast, all GSASs less than −3 or greater than 3 were adjusted to −3 and 3, respectively. Green is indicative of a lower (more negative) GSAS for a sample while red is indicative of a higher (more positive) GSAS for a sample. (B) Hierarchical clustering of the samples based on GSAS in the robust gene sets reveals two distinct groups of samples. The red group is enriched in samples with ER- breast cancer and distant metastasis occurrence compared to the green group.

Mentions: We additionally examined whether the GSASs for these gene sets could be used to distinguish clinically relevant subgroups. We hierarchically clustered samples from the van de Vijver et al dataset based on each sample’s GSAS for each of the gene sets tested. We then looked at whether clinical features such as ER status, lymph node metastasis status, and distant metastasis occurrence clustered as well. Figure 4 displays a heatmap detailing the results of this analysis. Samples were split into a red group and a green group based on where they clustered. The red group was enriched in samples with ER- breast cancer and distant metastasis occurrence relative to the green group, both indicators of more severe disease. The trends observed here suggest that GSASs from the robust gene sets are capturing clinically relevant processes in addition to survival information.Figure 4


Integrative analysis of survival-associated gene sets in breast cancer.

Varn FS, Ung MH, Lou SK, Cheng C - BMC Med Genomics (2015)

Hierarchical clustering of samples using the GSASs for the 120 robust gene sets. (A) A heatmap showing the pattern of GSASs for the 120 robust gene sets across the 295 samples in the van de Vijver dataset. Each row represents one sample’s GSAS profile for each of the 120 robust gene sets and each column represents the GSASs across all samples for one of the robust gene sets. To show contrast, all GSASs less than −3 or greater than 3 were adjusted to −3 and 3, respectively. Green is indicative of a lower (more negative) GSAS for a sample while red is indicative of a higher (more positive) GSAS for a sample. (B) Hierarchical clustering of the samples based on GSAS in the robust gene sets reveals two distinct groups of samples. The red group is enriched in samples with ER- breast cancer and distant metastasis occurrence compared to the green group.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4359519&req=5

Fig4: Hierarchical clustering of samples using the GSASs for the 120 robust gene sets. (A) A heatmap showing the pattern of GSASs for the 120 robust gene sets across the 295 samples in the van de Vijver dataset. Each row represents one sample’s GSAS profile for each of the 120 robust gene sets and each column represents the GSASs across all samples for one of the robust gene sets. To show contrast, all GSASs less than −3 or greater than 3 were adjusted to −3 and 3, respectively. Green is indicative of a lower (more negative) GSAS for a sample while red is indicative of a higher (more positive) GSAS for a sample. (B) Hierarchical clustering of the samples based on GSAS in the robust gene sets reveals two distinct groups of samples. The red group is enriched in samples with ER- breast cancer and distant metastasis occurrence compared to the green group.
Mentions: We additionally examined whether the GSASs for these gene sets could be used to distinguish clinically relevant subgroups. We hierarchically clustered samples from the van de Vijver et al dataset based on each sample’s GSAS for each of the gene sets tested. We then looked at whether clinical features such as ER status, lymph node metastasis status, and distant metastasis occurrence clustered as well. Figure 4 displays a heatmap detailing the results of this analysis. Samples were split into a red group and a green group based on where they clustered. The red group was enriched in samples with ER- breast cancer and distant metastasis occurrence relative to the green group, both indicators of more severe disease. The trends observed here suggest that GSASs from the robust gene sets are capturing clinically relevant processes in addition to survival information.Figure 4

Bottom Line: Using the GSAS metric, we identified 120 gene sets that were significantly associated with patient survival in all datasets tested.Most interestingly, removal of the genes in this gene set from the gene pool on MSigDB resulted in a large reduction in the number of predictive gene sets, suggesting a prominent role for these genes in breast cancer progression.We used this metric to identify predictive gene sets and to construct a novel gene set containing genes heavily involved in cancer progression.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, 03755, USA. Frederick.S.Varn.Jr.GR@dartmouth.edu.

ABSTRACT

Background: Patient gene expression information has recently become a clinical feature used to evaluate breast cancer prognosis. The emergence of prognostic gene sets that take advantage of these data has led to a rich library of information that can be used to characterize the molecular nature of a patient's cancer. Identifying robust gene sets that are consistently predictive of a patient's clinical outcome has become one of the main challenges in the field.

Methods: We inputted our previously established BASE algorithm with patient gene expression data and gene sets from MSigDB to develop the gene set activity score (GSAS), a metric that quantitatively assesses a gene set's activity level in a given patient. We utilized this metric, along with patient time-to-event data, to perform survival analyses to identify the gene sets that were significantly correlated with patient survival. We then performed cross-dataset analyses to identify robust prognostic gene sets and to classify patients by metastasis status. Additionally, we created a gene set network based on component gene overlap to explore the relationship between gene sets derived from MSigDB. We developed a novel gene set based on this network's topology and applied the GSAS metric to characterize its role in patient survival.

Results: Using the GSAS metric, we identified 120 gene sets that were significantly associated with patient survival in all datasets tested. The gene overlap network analysis yielded a novel gene set enriched in genes shared by the robustly predictive gene sets. This gene set was highly correlated to patient survival when used alone. Most interestingly, removal of the genes in this gene set from the gene pool on MSigDB resulted in a large reduction in the number of predictive gene sets, suggesting a prominent role for these genes in breast cancer progression.

Conclusions: The GSAS metric provided a useful medium by which we systematically investigated how gene sets from MSigDB relate to breast cancer patient survival. We used this metric to identify predictive gene sets and to construct a novel gene set containing genes heavily involved in cancer progression.

No MeSH data available.


Related in: MedlinePlus