Limits...
The cure: design and evaluation of a crowdsourcing game for gene selection for breast cancer survival prediction.

Good BM, Loguercio S, Griffith OL, Nanis M, Wu C, Su AI - JMIR Serious Games (2014)

Bottom Line: The Cure is available on the Internet.The principal contribution of this work is to show that crowdsourcing games can be developed as a means to address problems involving domain knowledge.While most prior work on scientific discovery games and crowdsourcing in general takes as a premise that contributors have little or no expertise, here we demonstrated a crowdsourcing system that succeeded in capturing expert knowledge.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Scripps Research Institute, Department of Molecular and Experimental Medicine, La Jolla, CA, United States. bgood@scripps.edu.

ABSTRACT

Background: Molecular signatures for predicting breast cancer prognosis could greatly improve care through personalization of treatment. Computational analyses of genome-wide expression datasets have identified such signatures, but these signatures leave much to be desired in terms of accuracy, reproducibility, and biological interpretability. Methods that take advantage of structured prior knowledge (eg, protein interaction networks) show promise in helping to define better signatures, but most knowledge remains unstructured. Crowdsourcing via scientific discovery games is an emerging methodology that has the potential to tap into human intelligence at scales and in modes unheard of before.

Objective: The main objective of this study was to test the hypothesis that knowledge linking expression patterns of specific genes to breast cancer outcomes could be captured from players of an open, Web-based game. We envisioned capturing knowledge both from the player's prior experience and from their ability to interpret text related to candidate genes presented to them in the context of the game.

Methods: We developed and evaluated an online game called The Cure that captured information from players regarding genes for use as predictors of breast cancer survival. Information gathered from game play was aggregated using a voting approach, and used to create rankings of genes. The top genes from these rankings were evaluated using annotation enrichment analysis, comparison to prior predictor gene sets, and by using them to train and test machine learning systems for predicting 10 year survival.

Results: Between its launch in September 2012 and September 2013, The Cure attracted more than 1000 registered players, who collectively played nearly 10,000 games. Gene sets assembled through aggregation of the collected data showed significant enrichment for genes known to be related to key concepts such as cancer, disease progression, and recurrence. In terms of the predictive accuracy of models trained using this information, these gene sets provided comparable performance to gene sets generated using other methods, including those used in commercial tests. The Cure is available on the Internet.

Conclusions: The principal contribution of this work is to show that crowdsourcing games can be developed as a means to address problems involving domain knowledge. While most prior work on scientific discovery games and crowdsourcing in general takes as a premise that contributors have little or no expertise, here we demonstrated a crowdsourcing system that succeeded in capturing expert knowledge.

No MeSH data available.


Related in: MedlinePlus

Evaluation of accuracy of models trained to predict ten year survival using gene sets derived from the game, and prior gene sets from the breast cancer literature. Lauss, Literature survey [27]. Vant’Veer datasets [3]. RFRS: Random Forest Relapse Score.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4307816&req=5

figure8: Evaluation of accuracy of models trained to predict ten year survival using gene sets derived from the game, and prior gene sets from the breast cancer literature. Lauss, Literature survey [27]. Vant’Veer datasets [3]. RFRS: Random Forest Relapse Score.

Mentions: We conducted two experiments, each involving the development of machine learning models for predicting 10 year survival based only on gene expression information. In the first, we trained an SVM classifier using gene expression data from the Metabric dataset [25], and tested it on the Oslo validation dataset generated for the Sage Dream7 breast cancer challenge [3]. In the second, we used the dataset from [4], using the same division of training/test data described in that publication. In both cases, we varied only the gene sets provided to the classifiers, and measured the performance of each gene set based on the accuracy of the SVM on the samples in the corresponding test set. Figure 8 shows that both the “expert” and “all” gene sets from the game performed comparably to the OncoType, MammaPrint, RFRS, Attractor MetaGenes, and to gene sets selected in a literature review [26]. In fact, the “expert” gene set from The Cure had the highest accuracy on the Griffith test set, and the third highest accuracy on the Oslo test set. In contrast, the 13 genes selected by the “inexperienced” players produced the worst classifier for the Oslo test set, and the second worst for the Griffith test set.


The cure: design and evaluation of a crowdsourcing game for gene selection for breast cancer survival prediction.

Good BM, Loguercio S, Griffith OL, Nanis M, Wu C, Su AI - JMIR Serious Games (2014)

Evaluation of accuracy of models trained to predict ten year survival using gene sets derived from the game, and prior gene sets from the breast cancer literature. Lauss, Literature survey [27]. Vant’Veer datasets [3]. RFRS: Random Forest Relapse Score.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4307816&req=5

figure8: Evaluation of accuracy of models trained to predict ten year survival using gene sets derived from the game, and prior gene sets from the breast cancer literature. Lauss, Literature survey [27]. Vant’Veer datasets [3]. RFRS: Random Forest Relapse Score.
Mentions: We conducted two experiments, each involving the development of machine learning models for predicting 10 year survival based only on gene expression information. In the first, we trained an SVM classifier using gene expression data from the Metabric dataset [25], and tested it on the Oslo validation dataset generated for the Sage Dream7 breast cancer challenge [3]. In the second, we used the dataset from [4], using the same division of training/test data described in that publication. In both cases, we varied only the gene sets provided to the classifiers, and measured the performance of each gene set based on the accuracy of the SVM on the samples in the corresponding test set. Figure 8 shows that both the “expert” and “all” gene sets from the game performed comparably to the OncoType, MammaPrint, RFRS, Attractor MetaGenes, and to gene sets selected in a literature review [26]. In fact, the “expert” gene set from The Cure had the highest accuracy on the Griffith test set, and the third highest accuracy on the Oslo test set. In contrast, the 13 genes selected by the “inexperienced” players produced the worst classifier for the Oslo test set, and the second worst for the Griffith test set.

Bottom Line: The Cure is available on the Internet.The principal contribution of this work is to show that crowdsourcing games can be developed as a means to address problems involving domain knowledge.While most prior work on scientific discovery games and crowdsourcing in general takes as a premise that contributors have little or no expertise, here we demonstrated a crowdsourcing system that succeeded in capturing expert knowledge.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Scripps Research Institute, Department of Molecular and Experimental Medicine, La Jolla, CA, United States. bgood@scripps.edu.

ABSTRACT

Background: Molecular signatures for predicting breast cancer prognosis could greatly improve care through personalization of treatment. Computational analyses of genome-wide expression datasets have identified such signatures, but these signatures leave much to be desired in terms of accuracy, reproducibility, and biological interpretability. Methods that take advantage of structured prior knowledge (eg, protein interaction networks) show promise in helping to define better signatures, but most knowledge remains unstructured. Crowdsourcing via scientific discovery games is an emerging methodology that has the potential to tap into human intelligence at scales and in modes unheard of before.

Objective: The main objective of this study was to test the hypothesis that knowledge linking expression patterns of specific genes to breast cancer outcomes could be captured from players of an open, Web-based game. We envisioned capturing knowledge both from the player's prior experience and from their ability to interpret text related to candidate genes presented to them in the context of the game.

Methods: We developed and evaluated an online game called The Cure that captured information from players regarding genes for use as predictors of breast cancer survival. Information gathered from game play was aggregated using a voting approach, and used to create rankings of genes. The top genes from these rankings were evaluated using annotation enrichment analysis, comparison to prior predictor gene sets, and by using them to train and test machine learning systems for predicting 10 year survival.

Results: Between its launch in September 2012 and September 2013, The Cure attracted more than 1000 registered players, who collectively played nearly 10,000 games. Gene sets assembled through aggregation of the collected data showed significant enrichment for genes known to be related to key concepts such as cancer, disease progression, and recurrence. In terms of the predictive accuracy of models trained using this information, these gene sets provided comparable performance to gene sets generated using other methods, including those used in commercial tests. The Cure is available on the Internet.

Conclusions: The principal contribution of this work is to show that crowdsourcing games can be developed as a means to address problems involving domain knowledge. While most prior work on scientific discovery games and crowdsourcing in general takes as a premise that contributors have little or no expertise, here we demonstrated a crowdsourcing system that succeeded in capturing expert knowledge.

No MeSH data available.


Related in: MedlinePlus