Limits...
Understanding the limits of animal models as predictors of human biology: lessons learned from the sbv IMPROVER Species Translation Challenge.

Rhrissorrakrai K, Belcastro V, Bilal E, Norel R, Poussin C, Mathis C, Dulize RH, Ivanov NV, Alexopoulos L, Rice JJ, Peitsch MC, Stolovitzky G, Meyer P, Hoeng J - Bioinformatics (2014)

Bottom Line: Participating teams submitted 49 different solutions across the sub-challenges, two-thirds of which were statistically significantly better than random.Additionally, similar computational methods were found to range widely in their performance within the same challenge, and no single method emerged as a clear winner across all sub-challenges.Finally, computational methods were able to effectively translate some specific stimuli and biological processes in the lung epithelial system, such as DNA synthesis, cytoskeleton and extracellular matrix, translation, immune/inflammation and growth factor/proliferation pathways, better than the expected response similarity between species. pmeyerr@us.ibm.com or Julia.Hoeng@pmi.com Supplementary data are available at Bioinformatics online.

View Article: PubMed Central - PubMed

Affiliation: IBM T.J. Watson Research Center, Computational Biology Center, Yorktown Heights, NY 10003, USA, Philip Morris International R&D, Philip Morris Products S.A., 2000 Neuchâtel, Switzerland, Telethon Institute of Genetics and Medicine, Via Pietro Castellino, 111, 80131 Naples, Italy, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki and National Technical University of Athens, Heroon Polytechniou 9, Zografou 15780, Greece.

Show MeSH

Related in: MedlinePlus

Scores and computational methods used for solving the STC. The  hypothesis simulation was used to compute and plot team Z-scores of AUPR curve, balance accuracy (BAC) and PCC for SC1 (A), SC2 (B) and SC3 (C). Z-scores are used to compare the apparent difficulty of each of the sub-challenges. Panels (C–G) reflect actual performance differences—as measured by overall rank of three metrics—for different methodological approaches. Teams’ rank distributions are plotted separately by the type of approach for SC1 (D), SC2 (E) and SC3 (F). (G) In SC2, teams’ rank distribution is separated by usage of solely protein phosphorylation data or in combination with gene expression data. SVM: support vector machines, Trees: random forest and other tree-based methods, NN: neural networks, GA: genetic algorithm
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4325540&req=5

btu611-F2: Scores and computational methods used for solving the STC. The hypothesis simulation was used to compute and plot team Z-scores of AUPR curve, balance accuracy (BAC) and PCC for SC1 (A), SC2 (B) and SC3 (C). Z-scores are used to compare the apparent difficulty of each of the sub-challenges. Panels (C–G) reflect actual performance differences—as measured by overall rank of three metrics—for different methodological approaches. Teams’ rank distributions are plotted separately by the type of approach for SC1 (D), SC2 (E) and SC3 (F). (G) In SC2, teams’ rank distribution is separated by usage of solely protein phosphorylation data or in combination with gene expression data. SVM: support vector machines, Trees: random forest and other tree-based methods, NN: neural networks, GA: genetic algorithm

Mentions: The overall success of participants in a sub-challenge can be measured by the median Z-score of the scoring metrics, which may be used to quantify the amount of predictive signal available in the provided data for a given classification problem. Z-scores offer a useful cross-challenge measure, as it takes into account size differences in the universe of predictions; important, since participants had to predict the activity of 16 × 26 phosphoprotein–stimulus pairs (SC1-2) and 246 × 26 gene set–stimulus pairs (SC3). Comparisons of the Z-scores for the three different metrics in Figures 2A–C suggest that protein phosphorylation was easier to translate across species (SC2) than solely within species from GEx (SC1), as reflected by higher Z-scores for AUPR and PCC. Inter-species protein phosphorylation also appeared easier to translate than inter-species pathway activation (SC3), as supported by the lower AUPR and PCC Z-scores for SC3 compared with SC2. The Z-scores for all three sub-challenges were tied for BAC (Fig. 2A–C).Fig. 2.


Understanding the limits of animal models as predictors of human biology: lessons learned from the sbv IMPROVER Species Translation Challenge.

Rhrissorrakrai K, Belcastro V, Bilal E, Norel R, Poussin C, Mathis C, Dulize RH, Ivanov NV, Alexopoulos L, Rice JJ, Peitsch MC, Stolovitzky G, Meyer P, Hoeng J - Bioinformatics (2014)

Scores and computational methods used for solving the STC. The  hypothesis simulation was used to compute and plot team Z-scores of AUPR curve, balance accuracy (BAC) and PCC for SC1 (A), SC2 (B) and SC3 (C). Z-scores are used to compare the apparent difficulty of each of the sub-challenges. Panels (C–G) reflect actual performance differences—as measured by overall rank of three metrics—for different methodological approaches. Teams’ rank distributions are plotted separately by the type of approach for SC1 (D), SC2 (E) and SC3 (F). (G) In SC2, teams’ rank distribution is separated by usage of solely protein phosphorylation data or in combination with gene expression data. SVM: support vector machines, Trees: random forest and other tree-based methods, NN: neural networks, GA: genetic algorithm
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4325540&req=5

btu611-F2: Scores and computational methods used for solving the STC. The hypothesis simulation was used to compute and plot team Z-scores of AUPR curve, balance accuracy (BAC) and PCC for SC1 (A), SC2 (B) and SC3 (C). Z-scores are used to compare the apparent difficulty of each of the sub-challenges. Panels (C–G) reflect actual performance differences—as measured by overall rank of three metrics—for different methodological approaches. Teams’ rank distributions are plotted separately by the type of approach for SC1 (D), SC2 (E) and SC3 (F). (G) In SC2, teams’ rank distribution is separated by usage of solely protein phosphorylation data or in combination with gene expression data. SVM: support vector machines, Trees: random forest and other tree-based methods, NN: neural networks, GA: genetic algorithm
Mentions: The overall success of participants in a sub-challenge can be measured by the median Z-score of the scoring metrics, which may be used to quantify the amount of predictive signal available in the provided data for a given classification problem. Z-scores offer a useful cross-challenge measure, as it takes into account size differences in the universe of predictions; important, since participants had to predict the activity of 16 × 26 phosphoprotein–stimulus pairs (SC1-2) and 246 × 26 gene set–stimulus pairs (SC3). Comparisons of the Z-scores for the three different metrics in Figures 2A–C suggest that protein phosphorylation was easier to translate across species (SC2) than solely within species from GEx (SC1), as reflected by higher Z-scores for AUPR and PCC. Inter-species protein phosphorylation also appeared easier to translate than inter-species pathway activation (SC3), as supported by the lower AUPR and PCC Z-scores for SC3 compared with SC2. The Z-scores for all three sub-challenges were tied for BAC (Fig. 2A–C).Fig. 2.

Bottom Line: Participating teams submitted 49 different solutions across the sub-challenges, two-thirds of which were statistically significantly better than random.Additionally, similar computational methods were found to range widely in their performance within the same challenge, and no single method emerged as a clear winner across all sub-challenges.Finally, computational methods were able to effectively translate some specific stimuli and biological processes in the lung epithelial system, such as DNA synthesis, cytoskeleton and extracellular matrix, translation, immune/inflammation and growth factor/proliferation pathways, better than the expected response similarity between species. pmeyerr@us.ibm.com or Julia.Hoeng@pmi.com Supplementary data are available at Bioinformatics online.

View Article: PubMed Central - PubMed

Affiliation: IBM T.J. Watson Research Center, Computational Biology Center, Yorktown Heights, NY 10003, USA, Philip Morris International R&D, Philip Morris Products S.A., 2000 Neuchâtel, Switzerland, Telethon Institute of Genetics and Medicine, Via Pietro Castellino, 111, 80131 Naples, Italy, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki and National Technical University of Athens, Heroon Polytechniou 9, Zografou 15780, Greece.

Show MeSH
Related in: MedlinePlus