Limits...
Network-based biomarkers enhance classical approaches to prognostic gene expression signatures.

Barter RL, Schramm SJ, Mann GJ, Yang YH - BMC Syst Biol (2014)

Bottom Line: We quantified resulting patterns of misclassification and discussed the relative value of each relative to ongoing development of prognostic biomarkers.We also found that the network-based NetRank feature selection method was the most stable.We have seen this clearly for the first time because of our in-depth analysis at the level of individual patients.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Classical approaches to predicting patient clinical outcome via gene expression information are primarily based on differential expression of unrelated genes (single-gene approaches) or genes related by, for example, biologic pathway or function (gene-sets). Recently, network-based approaches utilising interaction information between genes have emerged. An open problem is whether such approaches add value to the more traditional methods of signature modelling. We explored this question via comparison of the most widely employed single-gene, gene-set, and network-based methods, using gene expression microarray data from two different cancers: melanoma and ovarian. We considered two kinds of network approaches. The first of these identifies informative genes using gene expression and network connectivity information combined, the latter drawn from prior knowledge of protein-protein interactions. The second approach focuses on identification of informative sub-networks (small networks of interacting proteins, again from prior knowledge networks). For all methods we performed 100 rounds of 5-fold cross-validation under 3 different classifiers. For network-based approaches, we considered two different protein-protein interaction networks. We quantified resulting patterns of misclassification and discussed the relative value of each relative to ongoing development of prognostic biomarkers.

Results: We found that single-gene, gene-set and network methods yielded similar error rates in melanoma and ovarian cancer data. Crucially, however, our novel and detailed patient-level analyses revealed that the different methods were correctly classifying alternate subsets of patients in each cohort. We also found that the network-based NetRank feature selection method was the most stable.

Conclusions: Next-generation methods of gene expression signature modelling harness data from external networks and are foreshadowed as a standard mode of analysis. But what do they add to traditional approaches? Our findings indicate there is value in the way in which different subspaces of the patient sample are captured differently among the various methods, highlighting the possibility of 'combination' classifiers capable of identifying which patients will be more accurately classified by one particular method over another. We have seen this clearly for the first time because of our in-depth analysis at the level of individual patients.

Show MeSH

Related in: MedlinePlus

Class-specific classification error rates. The GP (dotted line) and PP (solid line) error rates averaged over the 100 rounds of 5-fold cross validation for each method are presented for the iRefWeb network and the RF classifier, the SVM classifier and the DLDA classifier using A) the melanoma data set and B) the ovarian cancer data set.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4290694&req=5

Figure 3: Class-specific classification error rates. The GP (dotted line) and PP (solid line) error rates averaged over the 100 rounds of 5-fold cross validation for each method are presented for the iRefWeb network and the RF classifier, the SVM classifier and the DLDA classifier using A) the melanoma data set and B) the ovarian cancer data set.

Mentions: An evaluation of the class-specific (good versus poor prognosis) error rates for each of the methods revealed that patients with good prognosis were easier to classify than patients with poor prognosis in the melanoma data set (Figure 3A). Specifically, for the RF classifier error rates for all methods ranged from 34-47% for the PP class and from 25-32% for the GP class. Under SVM classification, error rates ranged from 36-58% for the PP class and from 21-32% for the GP class. Using the DLDA classifier, error rates ranged from 29-51% for the PP class and from 26-34% for the GP class. The only exception to this observation was in case of the single-gene moderated t-statistic and NetRank methods under the DLDA classifier in which the PP class and the GP class had similar classification error rates.


Network-based biomarkers enhance classical approaches to prognostic gene expression signatures.

Barter RL, Schramm SJ, Mann GJ, Yang YH - BMC Syst Biol (2014)

Class-specific classification error rates. The GP (dotted line) and PP (solid line) error rates averaged over the 100 rounds of 5-fold cross validation for each method are presented for the iRefWeb network and the RF classifier, the SVM classifier and the DLDA classifier using A) the melanoma data set and B) the ovarian cancer data set.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4290694&req=5

Figure 3: Class-specific classification error rates. The GP (dotted line) and PP (solid line) error rates averaged over the 100 rounds of 5-fold cross validation for each method are presented for the iRefWeb network and the RF classifier, the SVM classifier and the DLDA classifier using A) the melanoma data set and B) the ovarian cancer data set.
Mentions: An evaluation of the class-specific (good versus poor prognosis) error rates for each of the methods revealed that patients with good prognosis were easier to classify than patients with poor prognosis in the melanoma data set (Figure 3A). Specifically, for the RF classifier error rates for all methods ranged from 34-47% for the PP class and from 25-32% for the GP class. Under SVM classification, error rates ranged from 36-58% for the PP class and from 21-32% for the GP class. Using the DLDA classifier, error rates ranged from 29-51% for the PP class and from 26-34% for the GP class. The only exception to this observation was in case of the single-gene moderated t-statistic and NetRank methods under the DLDA classifier in which the PP class and the GP class had similar classification error rates.

Bottom Line: We quantified resulting patterns of misclassification and discussed the relative value of each relative to ongoing development of prognostic biomarkers.We also found that the network-based NetRank feature selection method was the most stable.We have seen this clearly for the first time because of our in-depth analysis at the level of individual patients.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Classical approaches to predicting patient clinical outcome via gene expression information are primarily based on differential expression of unrelated genes (single-gene approaches) or genes related by, for example, biologic pathway or function (gene-sets). Recently, network-based approaches utilising interaction information between genes have emerged. An open problem is whether such approaches add value to the more traditional methods of signature modelling. We explored this question via comparison of the most widely employed single-gene, gene-set, and network-based methods, using gene expression microarray data from two different cancers: melanoma and ovarian. We considered two kinds of network approaches. The first of these identifies informative genes using gene expression and network connectivity information combined, the latter drawn from prior knowledge of protein-protein interactions. The second approach focuses on identification of informative sub-networks (small networks of interacting proteins, again from prior knowledge networks). For all methods we performed 100 rounds of 5-fold cross-validation under 3 different classifiers. For network-based approaches, we considered two different protein-protein interaction networks. We quantified resulting patterns of misclassification and discussed the relative value of each relative to ongoing development of prognostic biomarkers.

Results: We found that single-gene, gene-set and network methods yielded similar error rates in melanoma and ovarian cancer data. Crucially, however, our novel and detailed patient-level analyses revealed that the different methods were correctly classifying alternate subsets of patients in each cohort. We also found that the network-based NetRank feature selection method was the most stable.

Conclusions: Next-generation methods of gene expression signature modelling harness data from external networks and are foreshadowed as a standard mode of analysis. But what do they add to traditional approaches? Our findings indicate there is value in the way in which different subspaces of the patient sample are captured differently among the various methods, highlighting the possibility of 'combination' classifiers capable of identifying which patients will be more accurately classified by one particular method over another. We have seen this clearly for the first time because of our in-depth analysis at the level of individual patients.

Show MeSH
Related in: MedlinePlus