Limits...
OncoScore: a novel, Internet-based tool to assess the oncogenic potential of genes

View Article: PubMed Central - PubMed

ABSTRACT

The complicated, evolving landscape of cancer mutations poses a formidable challenge to identify cancer genes among the large lists of mutations typically generated in NGS experiments. The ability to prioritize these variants is therefore of paramount importance. To address this issue we developed OncoScore, a text-mining tool that ranks genes according to their association with cancer, based on available biomedical literature. Receiver operating characteristic curve and the area under the curve (AUC) metrics on manually curated datasets confirmed the excellent discriminating capability of OncoScore (OncoScore cut-off threshold = 21.09; AUC = 90.3%, 95% CI: 88.1–92.5%), indicating that OncoScore provides useful results in cases where an efficient prioritization of cancer-associated genes is needed.

No MeSH data available.


Related in: MedlinePlus

Time-series OncoScore plot spanning from 1975 to 2016.(a) Time-series plot involving a set of manually defined cancer (TP53, KRAS, NRAS, HRAS, ASXL1, IDH1, IDH2, TET2 and SETBP1) and housekeeping genes (GAPDH and GUSB). The grey boxes highlight two major scientific breakthroughs occurred during this time span. (b) Time-series plot of 10 genes randomly selected from the CGC (ARID1A, HMGA2, KIF5B, NUP214, RBM15; dashed lines) and nCan (ALMS1, DCAF17, GPD1L, WFS1, RBM10; continuous lines) dataset.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5384236&req=5

f4: Time-series OncoScore plot spanning from 1975 to 2016.(a) Time-series plot involving a set of manually defined cancer (TP53, KRAS, NRAS, HRAS, ASXL1, IDH1, IDH2, TET2 and SETBP1) and housekeeping genes (GAPDH and GUSB). The grey boxes highlight two major scientific breakthroughs occurred during this time span. (b) Time-series plot of 10 genes randomly selected from the CGC (ARID1A, HMGA2, KIF5B, NUP214, RBM15; dashed lines) and nCan (ALMS1, DCAF17, GPD1L, WFS1, RBM10; continuous lines) dataset.

Mentions: Given the strong impulse generated by NGS to cancer research, it is not uncommon that a gene previously considered ‘non-cancer’ subsequently turns-out to be a driver. The OncoScore of a newly discovered cancer gene will start increasing over time, as its oncogenic role is confirmed by subsequent studies. This process, however, requires a significant amount of time, causing a potential delay between the identification of the oncogene and the acquisition of a ‘driver’ OncoScore annotation. To facilitate the identification of recently discovered cancer genes we implemented a time-series function (perform.time.series.query) which allows plotting the OncoScore through a user-defined time-window. To test this function, we generated time-series queries spanning from 1975 to 2016 (Fig. 4) and involving two different dataset: 1) a set of manually defined cancer genes (TP53, KRAS, NRAS, HRAS, ASXL1, IDH1, IDH2, TET2, SETBP1) as well as the housekeeping genes GAPDH and GUSB; 2) a set of 10 genes randomly extracted from the CGC (ARID1A, HMGA2, KIF5B, NUP214 and RBM15) and nCan (ALMS1, DCAF17, GPD1L, WFS1 and RBM10) lists. In dataset 1, at the final time point (2016), all the cancer genes under analysis scored > 60, while the two housekeeping reached a plateau at ~20. The dynamics of the OncoScore pattern revealed the presence of 2 cancer gene clusters: the first one comprising oncogenes/oncosuppressors identified in the 1985-94 decade (TP53 and the RAS family), right after the development of the PCR by Kary Mullis19 and just a few years after the invention of the ‘Sanger’ sequencing technique by Frederick Sanger20; the second one occurring right after the NGS breakthrough and comprising ASXL1, IDH1/2, TET2 and SETBP1. In particular, the behavior of SETBP1 curve is interesting, because it reflects the complex story of SETBP1 discovery. SETBP1 was initially identified as an oncogene (NUP98-SETBP1 fusion) in pediatric acute T-cell lymphoblastic leukemia by Panagopoulos and colleagues21, which explains the first, sharp increase in SETBP1 OncoScore back in 2007. Subsequently, in January 2010, Cristobal and colleagues22 demonstrated SETBP1 overexpression as a novel leukemogenic mechanism in acute myeloid leukemia. Their finding is represented as a second peak in the SETBP1 time-series. Their finding was shortly followed by a seminal publication by Hoischen and colleagues23 where the authors demonstrated that de novo, germline SETBP1 mutations were responsible for the onset of the Schinzel-Giedion syndrome (SGS), a severe disorder characterized by severe mental retardation, distinctive facial features and multiple congenital malformations. Given that this finding doesn’t directly associate SETBP1 with cancer, this led to a decrease in the overall SETBP1 score, because in the period between 2010 and 2013 a number of papers confirming the link between SETBP1 and SGS (and therefore counting as negative for the OncoScore) appeared in the literature. Despite this decrease however, SETBP1 score never fell below the OncoScore cut-off threshold, therefore remaining in the cancer-associated genes group. Finally, in 2013 we24 and others2526 demonstrated the occurrence of somatic, oncogenic SETBP1 point mutations in several types of cancer, which caused a new increase in the overall SETBP1 OncoScore. In the second dataset all the CGC genes (5/507) were classified as oncogenes by OncoScore and 4 (4/302) out of 5 nCan genes (ALMS1, DCAF17, GPD1L and WFS1) were classified as ‘non-cancer’ at the final time point, as expected. The remaining nCan gene (1/302), RBM10, showed an interesting behavior, as its OncoScore remained close to 0 until 2010-2011, where it arose abruptly to over 40. Manual analysis of the literature showed a very recent association of this gene with cancer272829, which highlights the usefulness of the OncoScore analysis in order to identify recently discovered cancer-associated genes.


OncoScore: a novel, Internet-based tool to assess the oncogenic potential of genes
Time-series OncoScore plot spanning from 1975 to 2016.(a) Time-series plot involving a set of manually defined cancer (TP53, KRAS, NRAS, HRAS, ASXL1, IDH1, IDH2, TET2 and SETBP1) and housekeeping genes (GAPDH and GUSB). The grey boxes highlight two major scientific breakthroughs occurred during this time span. (b) Time-series plot of 10 genes randomly selected from the CGC (ARID1A, HMGA2, KIF5B, NUP214, RBM15; dashed lines) and nCan (ALMS1, DCAF17, GPD1L, WFS1, RBM10; continuous lines) dataset.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5384236&req=5

f4: Time-series OncoScore plot spanning from 1975 to 2016.(a) Time-series plot involving a set of manually defined cancer (TP53, KRAS, NRAS, HRAS, ASXL1, IDH1, IDH2, TET2 and SETBP1) and housekeeping genes (GAPDH and GUSB). The grey boxes highlight two major scientific breakthroughs occurred during this time span. (b) Time-series plot of 10 genes randomly selected from the CGC (ARID1A, HMGA2, KIF5B, NUP214, RBM15; dashed lines) and nCan (ALMS1, DCAF17, GPD1L, WFS1, RBM10; continuous lines) dataset.
Mentions: Given the strong impulse generated by NGS to cancer research, it is not uncommon that a gene previously considered ‘non-cancer’ subsequently turns-out to be a driver. The OncoScore of a newly discovered cancer gene will start increasing over time, as its oncogenic role is confirmed by subsequent studies. This process, however, requires a significant amount of time, causing a potential delay between the identification of the oncogene and the acquisition of a ‘driver’ OncoScore annotation. To facilitate the identification of recently discovered cancer genes we implemented a time-series function (perform.time.series.query) which allows plotting the OncoScore through a user-defined time-window. To test this function, we generated time-series queries spanning from 1975 to 2016 (Fig. 4) and involving two different dataset: 1) a set of manually defined cancer genes (TP53, KRAS, NRAS, HRAS, ASXL1, IDH1, IDH2, TET2, SETBP1) as well as the housekeeping genes GAPDH and GUSB; 2) a set of 10 genes randomly extracted from the CGC (ARID1A, HMGA2, KIF5B, NUP214 and RBM15) and nCan (ALMS1, DCAF17, GPD1L, WFS1 and RBM10) lists. In dataset 1, at the final time point (2016), all the cancer genes under analysis scored > 60, while the two housekeeping reached a plateau at ~20. The dynamics of the OncoScore pattern revealed the presence of 2 cancer gene clusters: the first one comprising oncogenes/oncosuppressors identified in the 1985-94 decade (TP53 and the RAS family), right after the development of the PCR by Kary Mullis19 and just a few years after the invention of the ‘Sanger’ sequencing technique by Frederick Sanger20; the second one occurring right after the NGS breakthrough and comprising ASXL1, IDH1/2, TET2 and SETBP1. In particular, the behavior of SETBP1 curve is interesting, because it reflects the complex story of SETBP1 discovery. SETBP1 was initially identified as an oncogene (NUP98-SETBP1 fusion) in pediatric acute T-cell lymphoblastic leukemia by Panagopoulos and colleagues21, which explains the first, sharp increase in SETBP1 OncoScore back in 2007. Subsequently, in January 2010, Cristobal and colleagues22 demonstrated SETBP1 overexpression as a novel leukemogenic mechanism in acute myeloid leukemia. Their finding is represented as a second peak in the SETBP1 time-series. Their finding was shortly followed by a seminal publication by Hoischen and colleagues23 where the authors demonstrated that de novo, germline SETBP1 mutations were responsible for the onset of the Schinzel-Giedion syndrome (SGS), a severe disorder characterized by severe mental retardation, distinctive facial features and multiple congenital malformations. Given that this finding doesn’t directly associate SETBP1 with cancer, this led to a decrease in the overall SETBP1 score, because in the period between 2010 and 2013 a number of papers confirming the link between SETBP1 and SGS (and therefore counting as negative for the OncoScore) appeared in the literature. Despite this decrease however, SETBP1 score never fell below the OncoScore cut-off threshold, therefore remaining in the cancer-associated genes group. Finally, in 2013 we24 and others2526 demonstrated the occurrence of somatic, oncogenic SETBP1 point mutations in several types of cancer, which caused a new increase in the overall SETBP1 OncoScore. In the second dataset all the CGC genes (5/507) were classified as oncogenes by OncoScore and 4 (4/302) out of 5 nCan genes (ALMS1, DCAF17, GPD1L and WFS1) were classified as ‘non-cancer’ at the final time point, as expected. The remaining nCan gene (1/302), RBM10, showed an interesting behavior, as its OncoScore remained close to 0 until 2010-2011, where it arose abruptly to over 40. Manual analysis of the literature showed a very recent association of this gene with cancer272829, which highlights the usefulness of the OncoScore analysis in order to identify recently discovered cancer-associated genes.

View Article: PubMed Central - PubMed

ABSTRACT

The complicated, evolving landscape of cancer mutations poses a formidable challenge to identify cancer genes among the large lists of mutations typically generated in NGS experiments. The ability to prioritize these variants is therefore of paramount importance. To address this issue we developed OncoScore, a text-mining tool that ranks genes according to their association with cancer, based on available biomedical literature. Receiver operating characteristic curve and the area under the curve (AUC) metrics on manually curated datasets confirmed the excellent discriminating capability of OncoScore (OncoScore cut-off threshold = 21.09; AUC = 90.3%, 95% CI: 88.1–92.5%), indicating that OncoScore provides useful results in cases where an efficient prioritization of cancer-associated genes is needed.

No MeSH data available.


Related in: MedlinePlus