Limits...
Understanding the limits of animal models as predictors of human biology: lessons learned from the sbv IMPROVER Species Translation Challenge.

Rhrissorrakrai K, Belcastro V, Bilal E, Norel R, Poussin C, Mathis C, Dulize RH, Ivanov NV, Alexopoulos L, Rice JJ, Peitsch MC, Stolovitzky G, Meyer P, Hoeng J - Bioinformatics (2014)

Bottom Line: Participating teams submitted 49 different solutions across the sub-challenges, two-thirds of which were statistically significantly better than random.Additionally, similar computational methods were found to range widely in their performance within the same challenge, and no single method emerged as a clear winner across all sub-challenges.Finally, computational methods were able to effectively translate some specific stimuli and biological processes in the lung epithelial system, such as DNA synthesis, cytoskeleton and extracellular matrix, translation, immune/inflammation and growth factor/proliferation pathways, better than the expected response similarity between species. pmeyerr@us.ibm.com or Julia.Hoeng@pmi.com Supplementary data are available at Bioinformatics online.

View Article: PubMed Central - PubMed

Affiliation: IBM T.J. Watson Research Center, Computational Biology Center, Yorktown Heights, NY 10003, USA, Philip Morris International R&D, Philip Morris Products S.A., 2000 Neuchâtel, Switzerland, Telethon Institute of Genetics and Medicine, Via Pietro Castellino, 111, 80131 Naples, Italy, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki and National Technical University of Athens, Heroon Polytechniou 9, Zografou 15780, Greece.

Show MeSH

Related in: MedlinePlus

Best translated gene sets representative of different pathways. (A) Histogram of the percentage of active gene set/stimulus pairs [560 pairs from 6396 (246 gene sets × 26 stimuli)] correctly predicted by N teams. Blue line represents the cumulative of the histogram values. (B) Distribution of teams’ Prg (blue) and Prs (red) values. (C and D) Best predicted gene sets as measured by Prg. (C) Barplot of 25 gene sets having a Prg Z-score ≥ 1.9. Blue star indicates a Sg Z-score ≥ 1.5. All gene sets are originally derived from Reactome unless otherwise indicated, according to MSigDB. (D) Hierarchical clustering of gene sets and genes that are present in at least 4 of the top 25 best predicted gene sets. Each cell is valued according to gene set membership and frequency the gene is found as part of that gene set’s GSEA CORE enrichment set. Gene/gene set pairs are assigned a 0 if the gene is not a member, 1 if only a member or 1 + C, where C is the number of stimuli under which the gene is found to be part of the CORE enrichment. Cells have a theoretical maximum value of 27. Cells are represented by a blue scale ranging from dark blue for 0 to white for the maximum value reached, here 7. Significantly overrepresented genes among these gene sets are labeled red (P-value < 0.01) or yellow (P-value < 0.05)
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4325540&req=5

btu611-F5: Best translated gene sets representative of different pathways. (A) Histogram of the percentage of active gene set/stimulus pairs [560 pairs from 6396 (246 gene sets × 26 stimuli)] correctly predicted by N teams. Blue line represents the cumulative of the histogram values. (B) Distribution of teams’ Prg (blue) and Prs (red) values. (C and D) Best predicted gene sets as measured by Prg. (C) Barplot of 25 gene sets having a Prg Z-score ≥ 1.9. Blue star indicates a Sg Z-score ≥ 1.5. All gene sets are originally derived from Reactome unless otherwise indicated, according to MSigDB. (D) Hierarchical clustering of gene sets and genes that are present in at least 4 of the top 25 best predicted gene sets. Each cell is valued according to gene set membership and frequency the gene is found as part of that gene set’s GSEA CORE enrichment set. Gene/gene set pairs are assigned a 0 if the gene is not a member, 1 if only a member or 1 + C, where C is the number of stimuli under which the gene is found to be part of the CORE enrichment. Cells have a theoretical maximum value of 27. Cells are represented by a blue scale ranging from dark blue for 0 to white for the maximum value reached, here 7. Significantly overrepresented genes among these gene sets are labeled red (P-value < 0.01) or yellow (P-value < 0.05)

Mentions: Figure 4A and B show the mean Prp and Prg for all participants plotted against Sp and Sg, respectively, based on activation in stimuli. A total of 49 of 246 gene sets were predicted better than expected by Sg (Prg > Sg > 0, Fig. 4A). Prediction performance per phosphoprotein Prp showed a ribosomal protein S6 kinase (KS6A1) and mitogen-activated protein kinases (MK09 and MP2K6) were predicted better than expected by Sp (Fig. 4B). Although aggregating all teams’ results did not yield a better overall prediction for protein phosphorylation activity, the aggregate of the five best teams performed better than individual predictions (Supplementary Fig. S5B). The high correlation between Prp and Sp (PCC = 0.71, P-value < 0.0087) reveals that most of the pathways defined by the protein phosphorylation activation were predicted with an accuracy expected by species similarity. We observed a similar situation for gene set activation prediction, with a lower but still significant correlation (PCC = 0.38, P-value < 1e-6). These results again suggested a slightly higher predictability in the protein phosphorylation data, though the prediction space was smaller. The individual team values for Prp and found that participants’ predictions were well translated for 71 of 176 active gene sets and for 8 of 16 phosphorylated proteins (Fig. 4A and B). Overall a higher percentage of teams performed better than species similarity when predicting protein phosphorylation activation (55%) versus predicting gene set activation (41%; see Fig. 4C and D). Nevertheless, when looking specifically at the set of active gene set and stimulus pairs (n = 560), 30% were correctly predicted by at least three teams (Fig. 5A), and in contrast to phosphorylation activation, six of seven teams in SC3 were better at globally translating the effects of stimuli than gene set activity (Fig. 5B).Fig. 4.


Understanding the limits of animal models as predictors of human biology: lessons learned from the sbv IMPROVER Species Translation Challenge.

Rhrissorrakrai K, Belcastro V, Bilal E, Norel R, Poussin C, Mathis C, Dulize RH, Ivanov NV, Alexopoulos L, Rice JJ, Peitsch MC, Stolovitzky G, Meyer P, Hoeng J - Bioinformatics (2014)

Best translated gene sets representative of different pathways. (A) Histogram of the percentage of active gene set/stimulus pairs [560 pairs from 6396 (246 gene sets × 26 stimuli)] correctly predicted by N teams. Blue line represents the cumulative of the histogram values. (B) Distribution of teams’ Prg (blue) and Prs (red) values. (C and D) Best predicted gene sets as measured by Prg. (C) Barplot of 25 gene sets having a Prg Z-score ≥ 1.9. Blue star indicates a Sg Z-score ≥ 1.5. All gene sets are originally derived from Reactome unless otherwise indicated, according to MSigDB. (D) Hierarchical clustering of gene sets and genes that are present in at least 4 of the top 25 best predicted gene sets. Each cell is valued according to gene set membership and frequency the gene is found as part of that gene set’s GSEA CORE enrichment set. Gene/gene set pairs are assigned a 0 if the gene is not a member, 1 if only a member or 1 + C, where C is the number of stimuli under which the gene is found to be part of the CORE enrichment. Cells have a theoretical maximum value of 27. Cells are represented by a blue scale ranging from dark blue for 0 to white for the maximum value reached, here 7. Significantly overrepresented genes among these gene sets are labeled red (P-value < 0.01) or yellow (P-value < 0.05)
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4325540&req=5

btu611-F5: Best translated gene sets representative of different pathways. (A) Histogram of the percentage of active gene set/stimulus pairs [560 pairs from 6396 (246 gene sets × 26 stimuli)] correctly predicted by N teams. Blue line represents the cumulative of the histogram values. (B) Distribution of teams’ Prg (blue) and Prs (red) values. (C and D) Best predicted gene sets as measured by Prg. (C) Barplot of 25 gene sets having a Prg Z-score ≥ 1.9. Blue star indicates a Sg Z-score ≥ 1.5. All gene sets are originally derived from Reactome unless otherwise indicated, according to MSigDB. (D) Hierarchical clustering of gene sets and genes that are present in at least 4 of the top 25 best predicted gene sets. Each cell is valued according to gene set membership and frequency the gene is found as part of that gene set’s GSEA CORE enrichment set. Gene/gene set pairs are assigned a 0 if the gene is not a member, 1 if only a member or 1 + C, where C is the number of stimuli under which the gene is found to be part of the CORE enrichment. Cells have a theoretical maximum value of 27. Cells are represented by a blue scale ranging from dark blue for 0 to white for the maximum value reached, here 7. Significantly overrepresented genes among these gene sets are labeled red (P-value < 0.01) or yellow (P-value < 0.05)
Mentions: Figure 4A and B show the mean Prp and Prg for all participants plotted against Sp and Sg, respectively, based on activation in stimuli. A total of 49 of 246 gene sets were predicted better than expected by Sg (Prg > Sg > 0, Fig. 4A). Prediction performance per phosphoprotein Prp showed a ribosomal protein S6 kinase (KS6A1) and mitogen-activated protein kinases (MK09 and MP2K6) were predicted better than expected by Sp (Fig. 4B). Although aggregating all teams’ results did not yield a better overall prediction for protein phosphorylation activity, the aggregate of the five best teams performed better than individual predictions (Supplementary Fig. S5B). The high correlation between Prp and Sp (PCC = 0.71, P-value < 0.0087) reveals that most of the pathways defined by the protein phosphorylation activation were predicted with an accuracy expected by species similarity. We observed a similar situation for gene set activation prediction, with a lower but still significant correlation (PCC = 0.38, P-value < 1e-6). These results again suggested a slightly higher predictability in the protein phosphorylation data, though the prediction space was smaller. The individual team values for Prp and found that participants’ predictions were well translated for 71 of 176 active gene sets and for 8 of 16 phosphorylated proteins (Fig. 4A and B). Overall a higher percentage of teams performed better than species similarity when predicting protein phosphorylation activation (55%) versus predicting gene set activation (41%; see Fig. 4C and D). Nevertheless, when looking specifically at the set of active gene set and stimulus pairs (n = 560), 30% were correctly predicted by at least three teams (Fig. 5A), and in contrast to phosphorylation activation, six of seven teams in SC3 were better at globally translating the effects of stimuli than gene set activity (Fig. 5B).Fig. 4.

Bottom Line: Participating teams submitted 49 different solutions across the sub-challenges, two-thirds of which were statistically significantly better than random.Additionally, similar computational methods were found to range widely in their performance within the same challenge, and no single method emerged as a clear winner across all sub-challenges.Finally, computational methods were able to effectively translate some specific stimuli and biological processes in the lung epithelial system, such as DNA synthesis, cytoskeleton and extracellular matrix, translation, immune/inflammation and growth factor/proliferation pathways, better than the expected response similarity between species. pmeyerr@us.ibm.com or Julia.Hoeng@pmi.com Supplementary data are available at Bioinformatics online.

View Article: PubMed Central - PubMed

Affiliation: IBM T.J. Watson Research Center, Computational Biology Center, Yorktown Heights, NY 10003, USA, Philip Morris International R&D, Philip Morris Products S.A., 2000 Neuchâtel, Switzerland, Telethon Institute of Genetics and Medicine, Via Pietro Castellino, 111, 80131 Naples, Italy, ProtATonce Ltd, Scientific Park Lefkippos, Patriarchou Grigoriou & Neapoleos 15343 Ag. Paraskevi, Attiki and National Technical University of Athens, Heroon Polytechniou 9, Zografou 15780, Greece.

Show MeSH
Related in: MedlinePlus