Limits...
Co-acting gene networks predict TRAIL responsiveness of tumour cells with high accuracy.

O'Reilly P, Ortutay C, Gernon G, O'Connell E, Seoighe C, Boyce S, Serrano L, Szegezdi E - BMC Genomics (2014)

Bottom Line: Importantly, only 12% of the TRAIL-predictor genes were differentially expressed highlighting the importance of functional interactions in predicting the biological response.The advantage of co-acting gene clusters is that this analysis does not depend on differential expression and is able to incorporate direct- and indirect gene interactions as well as tissue- and cell-specific characteristics.This approach (1) identified a descriptor of TRAIL sensitivity which performs significantly better as a predictor of TRAIL sensitivity than any previously reported gene signatures, (2) identified potential novel regulators of TRAIL-responsiveness and (3) provided a systematic view highlighting fundamental differences between the molecular wiring of sensitive and resistant cell types.

View Article: PubMed Central - PubMed

Affiliation: Apoptosis Research Centre, National University of Ireland Galway, University Rd, Galway, Ireland. eva.szegezdi@nuigalway.ie.

ABSTRACT

Background: Identification of differentially expressed genes from transcriptomic studies is one of the most common mechanisms to identify tumor biomarkers. This approach however is not well suited to identify interaction between genes whose protein products potentially influence each other, which limits its power to identify molecular wiring of tumour cells dictating response to a drug. Due to the fact that signal transduction pathways are not linear and highly interlinked, the biological response they drive may be better described by the relative amount of their components and their functional relationships than by their individual, absolute expression.

Results: Gene expression microarray data for 109 tumor cell lines with known sensitivity to the death ligand cytokine tumor necrosis factor-related apoptosis-inducing ligand (TRAIL) was used to identify genes with potential functional relationships determining responsiveness to TRAIL-induced apoptosis. The machine learning technique Random Forest in the statistical environment "R" with backward elimination was used to identify the key predictors of TRAIL sensitivity and differentially expressed genes were identified using the software GeneSpring. Gene co-regulation and statistical interaction was assessed with q-order partial correlation analysis and non-rejection rate. Biological (functional) interactions amongst the co-acting genes were studied with Ingenuity network analysis. Prediction accuracy was assessed by calculating the area under the receiver operator curve using an independent dataset. We show that the gene panel identified could predict TRAIL-sensitivity with a very high degree of sensitivity and specificity (AUC=0·84). The genes in the panel are co-regulated and at least 40% of them functionally interact in signal transduction pathways that regulate cell death and cell survival, cellular differentiation and morphogenesis. Importantly, only 12% of the TRAIL-predictor genes were differentially expressed highlighting the importance of functional interactions in predicting the biological response.

Conclusions: The advantage of co-acting gene clusters is that this analysis does not depend on differential expression and is able to incorporate direct- and indirect gene interactions as well as tissue- and cell-specific characteristics. This approach (1) identified a descriptor of TRAIL sensitivity which performs significantly better as a predictor of TRAIL sensitivity than any previously reported gene signatures, (2) identified potential novel regulators of TRAIL-responsiveness and (3) provided a systematic view highlighting fundamental differences between the molecular wiring of sensitive and resistant cell types.

Show MeSH

Related in: MedlinePlus

Identification of the core co-acting gene set. (A) Gene ranking by Gini-importance. A singular “best” probeset for each gene was used to grow 10,000 classification trees. The importance of each gene in classifying cell lines as sensitive or resistant to TRAIL was measured by mean decrease in Gini-importance in the training dataset. The probesets above the red line represent the top 5th percentile retained for further analysis. Only genes with Gini-importance value higher than zero were plotted. (B) The top 350 genes predict TRAIL-responsiveness with high accuracy. From the top-ranked 1000 genes, the lowest ranked genes were stepwise removed (by units of 100 and then 10) and the performance of the remaining gene-set was determined by calculating the out of bag classification error (OOB) (stepwise 10-gene unit removal between top 300-top 200 genes had no effect and thus it is not shown on the graph). (C) Validation of the prediction accuracy of the 350 co-acting genes. The area under the receiver operator curve (AUC) was calculated as a measure of the models specificity and sensitivity in the independent test dataset on the dataset (black line, AUC = 0 · 85) as well as after swapping the sensitivity values of a randomly-selected 50% of the cells lines (red line, AUC = 0 · 48). The graph shows the AUC. This is a representative graph from 100 repeats of random permutations. (D) The 350 co-acting genes are not identified by differential expression analysis. A histogram displaying the gene distribution based on fold difference in expression between TRAIL sensitive and resistant cell lines. The number of genes from the panel of 350 co-acting genes falling in the individual fold difference ranges on the histogram is indicated by the numbers above each column.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4378270&req=5

Fig2: Identification of the core co-acting gene set. (A) Gene ranking by Gini-importance. A singular “best” probeset for each gene was used to grow 10,000 classification trees. The importance of each gene in classifying cell lines as sensitive or resistant to TRAIL was measured by mean decrease in Gini-importance in the training dataset. The probesets above the red line represent the top 5th percentile retained for further analysis. Only genes with Gini-importance value higher than zero were plotted. (B) The top 350 genes predict TRAIL-responsiveness with high accuracy. From the top-ranked 1000 genes, the lowest ranked genes were stepwise removed (by units of 100 and then 10) and the performance of the remaining gene-set was determined by calculating the out of bag classification error (OOB) (stepwise 10-gene unit removal between top 300-top 200 genes had no effect and thus it is not shown on the graph). (C) Validation of the prediction accuracy of the 350 co-acting genes. The area under the receiver operator curve (AUC) was calculated as a measure of the models specificity and sensitivity in the independent test dataset on the dataset (black line, AUC = 0 · 85) as well as after swapping the sensitivity values of a randomly-selected 50% of the cells lines (red line, AUC = 0 · 48). The graph shows the AUC. This is a representative graph from 100 repeats of random permutations. (D) The 350 co-acting genes are not identified by differential expression analysis. A histogram displaying the gene distribution based on fold difference in expression between TRAIL sensitive and resistant cell lines. The number of genes from the panel of 350 co-acting genes falling in the individual fold difference ranges on the histogram is indicated by the numbers above each column.

Mentions: The importance of the 19,190 genes in predicting TRAIL-responsiveness was determined by calculating the mean decrease in Gini-importance, which is based on calculating the reduction in prediction accuracy after permuting the expression value of the gene in question (referred to as Gini-importance). From the Gini-importance list the top fifth percentile (the highest ranking 1000 genes) was retained for further analysis (Figure 2A). These genes could predict TRAIL-responsiveness with an out of bag (OOB) error of 16%. To improve the performance of the model, the bottom-ranking genes of the Gini-importance list were stepwise removed (backward elimination), the RF model rerun and the performance assessed by calculating the OOB error (Figure 2B). This analysis identified that the top 350 as well as the smaller subset of the top-ranking 120 genes, performed best with OOB error rates of 10 · 1% and 8 · 3%, respectively. Since the contribution and importance of individual genes is likely to be different in different sample types, the larger, 350 gene subset was chosen as the TRAIL-response predictor co-acting gene panel (listed in Additional file 2: Table S1).An independent dataset (NIH CellMiner) was used to determine the prediction accuracy of the 350 gene-panel and it confirmed that these genes predicted TRAIL-responsiveness with high accuracy of 0.84, measured as the area under the receiver operator characteristic curve (AUC of ROC curve, Figure 2C). To test the relevance of these genes as predictors of TRAIL-responsiveness, the sensitivity value (sensitive or resistant) of the cell lines was changed to the incorrect alternative in a randomly selected 50% of the samples and the accuracy of the model determined. The AUC reduced to 0.48 (p < 0 · 05) confirming that the prediction accuracy achieved was unlikely to have occurred by chance (Figure 2C).The genes differentially expressed between sensitive and resistant cell lines were then identified and compared to the co-acting gene panel identified with RF. There were 254 genes that showed a minimum of 2-fold difference in expression and were considered statistically significant. Interestingly, the majority (82%) of the co-acting genes were not differentially expressed (Figure 2D).Figure 2


Co-acting gene networks predict TRAIL responsiveness of tumour cells with high accuracy.

O'Reilly P, Ortutay C, Gernon G, O'Connell E, Seoighe C, Boyce S, Serrano L, Szegezdi E - BMC Genomics (2014)

Identification of the core co-acting gene set. (A) Gene ranking by Gini-importance. A singular “best” probeset for each gene was used to grow 10,000 classification trees. The importance of each gene in classifying cell lines as sensitive or resistant to TRAIL was measured by mean decrease in Gini-importance in the training dataset. The probesets above the red line represent the top 5th percentile retained for further analysis. Only genes with Gini-importance value higher than zero were plotted. (B) The top 350 genes predict TRAIL-responsiveness with high accuracy. From the top-ranked 1000 genes, the lowest ranked genes were stepwise removed (by units of 100 and then 10) and the performance of the remaining gene-set was determined by calculating the out of bag classification error (OOB) (stepwise 10-gene unit removal between top 300-top 200 genes had no effect and thus it is not shown on the graph). (C) Validation of the prediction accuracy of the 350 co-acting genes. The area under the receiver operator curve (AUC) was calculated as a measure of the models specificity and sensitivity in the independent test dataset on the dataset (black line, AUC = 0 · 85) as well as after swapping the sensitivity values of a randomly-selected 50% of the cells lines (red line, AUC = 0 · 48). The graph shows the AUC. This is a representative graph from 100 repeats of random permutations. (D) The 350 co-acting genes are not identified by differential expression analysis. A histogram displaying the gene distribution based on fold difference in expression between TRAIL sensitive and resistant cell lines. The number of genes from the panel of 350 co-acting genes falling in the individual fold difference ranges on the histogram is indicated by the numbers above each column.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4378270&req=5

Fig2: Identification of the core co-acting gene set. (A) Gene ranking by Gini-importance. A singular “best” probeset for each gene was used to grow 10,000 classification trees. The importance of each gene in classifying cell lines as sensitive or resistant to TRAIL was measured by mean decrease in Gini-importance in the training dataset. The probesets above the red line represent the top 5th percentile retained for further analysis. Only genes with Gini-importance value higher than zero were plotted. (B) The top 350 genes predict TRAIL-responsiveness with high accuracy. From the top-ranked 1000 genes, the lowest ranked genes were stepwise removed (by units of 100 and then 10) and the performance of the remaining gene-set was determined by calculating the out of bag classification error (OOB) (stepwise 10-gene unit removal between top 300-top 200 genes had no effect and thus it is not shown on the graph). (C) Validation of the prediction accuracy of the 350 co-acting genes. The area under the receiver operator curve (AUC) was calculated as a measure of the models specificity and sensitivity in the independent test dataset on the dataset (black line, AUC = 0 · 85) as well as after swapping the sensitivity values of a randomly-selected 50% of the cells lines (red line, AUC = 0 · 48). The graph shows the AUC. This is a representative graph from 100 repeats of random permutations. (D) The 350 co-acting genes are not identified by differential expression analysis. A histogram displaying the gene distribution based on fold difference in expression between TRAIL sensitive and resistant cell lines. The number of genes from the panel of 350 co-acting genes falling in the individual fold difference ranges on the histogram is indicated by the numbers above each column.
Mentions: The importance of the 19,190 genes in predicting TRAIL-responsiveness was determined by calculating the mean decrease in Gini-importance, which is based on calculating the reduction in prediction accuracy after permuting the expression value of the gene in question (referred to as Gini-importance). From the Gini-importance list the top fifth percentile (the highest ranking 1000 genes) was retained for further analysis (Figure 2A). These genes could predict TRAIL-responsiveness with an out of bag (OOB) error of 16%. To improve the performance of the model, the bottom-ranking genes of the Gini-importance list were stepwise removed (backward elimination), the RF model rerun and the performance assessed by calculating the OOB error (Figure 2B). This analysis identified that the top 350 as well as the smaller subset of the top-ranking 120 genes, performed best with OOB error rates of 10 · 1% and 8 · 3%, respectively. Since the contribution and importance of individual genes is likely to be different in different sample types, the larger, 350 gene subset was chosen as the TRAIL-response predictor co-acting gene panel (listed in Additional file 2: Table S1).An independent dataset (NIH CellMiner) was used to determine the prediction accuracy of the 350 gene-panel and it confirmed that these genes predicted TRAIL-responsiveness with high accuracy of 0.84, measured as the area under the receiver operator characteristic curve (AUC of ROC curve, Figure 2C). To test the relevance of these genes as predictors of TRAIL-responsiveness, the sensitivity value (sensitive or resistant) of the cell lines was changed to the incorrect alternative in a randomly selected 50% of the samples and the accuracy of the model determined. The AUC reduced to 0.48 (p < 0 · 05) confirming that the prediction accuracy achieved was unlikely to have occurred by chance (Figure 2C).The genes differentially expressed between sensitive and resistant cell lines were then identified and compared to the co-acting gene panel identified with RF. There were 254 genes that showed a minimum of 2-fold difference in expression and were considered statistically significant. Interestingly, the majority (82%) of the co-acting genes were not differentially expressed (Figure 2D).Figure 2

Bottom Line: Importantly, only 12% of the TRAIL-predictor genes were differentially expressed highlighting the importance of functional interactions in predicting the biological response.The advantage of co-acting gene clusters is that this analysis does not depend on differential expression and is able to incorporate direct- and indirect gene interactions as well as tissue- and cell-specific characteristics.This approach (1) identified a descriptor of TRAIL sensitivity which performs significantly better as a predictor of TRAIL sensitivity than any previously reported gene signatures, (2) identified potential novel regulators of TRAIL-responsiveness and (3) provided a systematic view highlighting fundamental differences between the molecular wiring of sensitive and resistant cell types.

View Article: PubMed Central - PubMed

Affiliation: Apoptosis Research Centre, National University of Ireland Galway, University Rd, Galway, Ireland. eva.szegezdi@nuigalway.ie.

ABSTRACT

Background: Identification of differentially expressed genes from transcriptomic studies is one of the most common mechanisms to identify tumor biomarkers. This approach however is not well suited to identify interaction between genes whose protein products potentially influence each other, which limits its power to identify molecular wiring of tumour cells dictating response to a drug. Due to the fact that signal transduction pathways are not linear and highly interlinked, the biological response they drive may be better described by the relative amount of their components and their functional relationships than by their individual, absolute expression.

Results: Gene expression microarray data for 109 tumor cell lines with known sensitivity to the death ligand cytokine tumor necrosis factor-related apoptosis-inducing ligand (TRAIL) was used to identify genes with potential functional relationships determining responsiveness to TRAIL-induced apoptosis. The machine learning technique Random Forest in the statistical environment "R" with backward elimination was used to identify the key predictors of TRAIL sensitivity and differentially expressed genes were identified using the software GeneSpring. Gene co-regulation and statistical interaction was assessed with q-order partial correlation analysis and non-rejection rate. Biological (functional) interactions amongst the co-acting genes were studied with Ingenuity network analysis. Prediction accuracy was assessed by calculating the area under the receiver operator curve using an independent dataset. We show that the gene panel identified could predict TRAIL-sensitivity with a very high degree of sensitivity and specificity (AUC=0·84). The genes in the panel are co-regulated and at least 40% of them functionally interact in signal transduction pathways that regulate cell death and cell survival, cellular differentiation and morphogenesis. Importantly, only 12% of the TRAIL-predictor genes were differentially expressed highlighting the importance of functional interactions in predicting the biological response.

Conclusions: The advantage of co-acting gene clusters is that this analysis does not depend on differential expression and is able to incorporate direct- and indirect gene interactions as well as tissue- and cell-specific characteristics. This approach (1) identified a descriptor of TRAIL sensitivity which performs significantly better as a predictor of TRAIL sensitivity than any previously reported gene signatures, (2) identified potential novel regulators of TRAIL-responsiveness and (3) provided a systematic view highlighting fundamental differences between the molecular wiring of sensitive and resistant cell types.

Show MeSH
Related in: MedlinePlus