Limits...
Identifying restrictions in the order of accumulation of mutations during tumor progression: effects of passengers, evolutionary models, and sampling.

Diaz-Uriarte R - BMC Bioinformatics (2015)

Bottom Line: This allowed me to assess, for the first time, the effects of having to filter out passengers, of sampling schemes (when, how, and how many samples), and of deviations from order restrictions.Having to filter passengers lead to decreased performance, especially because true restrictions were missed.Single cell sampling provided no advantage, but sampling in the final stages of the disease vs. sampling at different stages had severe effects.

View Article: PubMed Central - PubMed

Affiliation: Dept. Biochemistry, Universidad Autónoma de Madrid, Instituto de Investigaciones Biomédicas "Alberto Sols" (UAM-CSIC), Arzobispo Morcillo, 4, 28029, Madrid, Spain. ramon.diaz@iib.uam.es.

ABSTRACT

Background: Cancer progression is caused by the sequential accumulation of mutations, but not all orders of accumulation are equally likely. When the fixation of some mutations depends on the presence of previous ones, identifying restrictions in the order of accumulation of mutations can lead to the discovery of therapeutic targets and diagnostic markers. The purpose of this study is to conduct a comprehensive comparison of the performance of all available methods to identify these restrictions from cross-sectional data. I used simulated data sets (where the true restrictions are known) but, in contrast to previous work, I embedded restrictions within evolutionary models of tumor progression that included passengers (mutations not responsible for the development of cancer, known to be very common). This allowed me to assess, for the first time, the effects of having to filter out passengers, of sampling schemes (when, how, and how many samples), and of deviations from order restrictions.

Results: Poor choices of method, filtering, and sampling lead to large errors in all performance measures. Having to filter passengers lead to decreased performance, especially because true restrictions were missed. Overall, the best method for identifying order restrictions were Oncogenetic Trees, a fast and easy to use method that, although unable to recover dependencies of mutations on more than one mutation, showed good performance in most scenarios, superior to Conjunctive Bayesian Networks and Progression Networks. Single cell sampling provided no advantage, but sampling in the final stages of the disease vs. sampling at different stages had severe effects. Evolutionary model and deviations from order restrictions had major, and sometimes counterintuitive, interactions with other factors that affected performance.

Conclusions: This paper provides practical recommendations for using these methods with experimental data. It also identifies key areas of future methodological work and, in particular, it shows that it is both possible and necessary to embed assumptions about order restrictions and the nature of driver status within evolutionary models of cancer progression to evaluate the performance of inferential approaches.

Show MeSH

Related in: MedlinePlus

Drivers known, plot of the coefficients (posterior mean and 0.025 and 0.975 quantiles) for Conjunction, Method, S.Time, S.Type and S.Size from the GLMMs for each performance measure. X-axis labeled by the exponential of the coefficient (i.e., relative change in the odds ratio or in the scale of the Poisson parameter for Diff): smaller (or lefter) is better. The vertical dashed line denotes no change relative to the overall mean (the intercept). The x-axis has been scaled to make it symmetric (e.g., a ratio of 1.25 is the same distance from the vertical line as a ratio of 1/1.25). Coefficients that correspond to a change larger than 25% (i.e., ratio>1.25 or <1/1.25) shown in larger red dots. The coefficients shown are only those that represent a change larger than 25% for at least one performance measure, or coefficients that are marginal to those shown (e.g., any main effect from an interaction that includes it).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4339747&req=5

Fig2: Drivers known, plot of the coefficients (posterior mean and 0.025 and 0.975 quantiles) for Conjunction, Method, S.Time, S.Type and S.Size from the GLMMs for each performance measure. X-axis labeled by the exponential of the coefficient (i.e., relative change in the odds ratio or in the scale of the Poisson parameter for Diff): smaller (or lefter) is better. The vertical dashed line denotes no change relative to the overall mean (the intercept). The x-axis has been scaled to make it symmetric (e.g., a ratio of 1.25 is the same distance from the vertical line as a ratio of 1/1.25). Coefficients that correspond to a change larger than 25% (i.e., ratio>1.25 or <1/1.25) shown in larger red dots. The coefficients shown are only those that represent a change larger than 25% for at least one performance measure, or coefficients that are marginal to those shown (e.g., any main effect from an interaction that includes it).

Mentions: Figures 2 and 3 show the coefficients from the GLMM fits. From Figure 2 we see that DiP and DiP-A only performed better than the average of methods with respect to performance measure FPF (which, as mentioned before, is of minor value compared to PND and PFD), and CBN and CBN-A only with respect to performance measure PND. However, for performance measure PND the better performance of CBN/CBN-A compared to other methods was concentrated in graphs with conjunctions. The left column for PND in Table 3 shows that the best five methods were all CBN/CBN-A, but the right column for PND shows that OT-A occupies the first three and fifth positions. The analysis of frequencies of confidence sets, in Table 4, again reveals the same patterns: OT and OT-A were clearly the best methods for performance measures Diff and PFD, and were best methods with DiP/DiP-A for performance measure FPF (again, FPF is of minor relevance compared to PND and PFD). CBN and CBN-A were in confidence sets that did not include any of the other methods in 67% of the cases for performance measure PND in graphs with conjunction. In the absence of conjunctions, however, confidence sets that did not include CBN/CBN-A were more prevalent than those that included CBN/CBN-A. That the best performance of OT/OT-A in graphs with conjunctions cannot be perfect should be expected, and we should only see perfect performance in these cases, if at all, with CBN/CBN-A or DiP/DiP-A. Two extreme cases (which also provide an internal consistency check) are graphs “7-A” and “11-A” (both have conjunctions): perfect performance was achieved for the first with CBN-A and for the second with DiP (S.Size = 1000, McF_6, sh Inf and 0, S.Time unif and last, respectively —see Additional file 6).Figure 2


Identifying restrictions in the order of accumulation of mutations during tumor progression: effects of passengers, evolutionary models, and sampling.

Diaz-Uriarte R - BMC Bioinformatics (2015)

Drivers known, plot of the coefficients (posterior mean and 0.025 and 0.975 quantiles) for Conjunction, Method, S.Time, S.Type and S.Size from the GLMMs for each performance measure. X-axis labeled by the exponential of the coefficient (i.e., relative change in the odds ratio or in the scale of the Poisson parameter for Diff): smaller (or lefter) is better. The vertical dashed line denotes no change relative to the overall mean (the intercept). The x-axis has been scaled to make it symmetric (e.g., a ratio of 1.25 is the same distance from the vertical line as a ratio of 1/1.25). Coefficients that correspond to a change larger than 25% (i.e., ratio>1.25 or <1/1.25) shown in larger red dots. The coefficients shown are only those that represent a change larger than 25% for at least one performance measure, or coefficients that are marginal to those shown (e.g., any main effect from an interaction that includes it).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4339747&req=5

Fig2: Drivers known, plot of the coefficients (posterior mean and 0.025 and 0.975 quantiles) for Conjunction, Method, S.Time, S.Type and S.Size from the GLMMs for each performance measure. X-axis labeled by the exponential of the coefficient (i.e., relative change in the odds ratio or in the scale of the Poisson parameter for Diff): smaller (or lefter) is better. The vertical dashed line denotes no change relative to the overall mean (the intercept). The x-axis has been scaled to make it symmetric (e.g., a ratio of 1.25 is the same distance from the vertical line as a ratio of 1/1.25). Coefficients that correspond to a change larger than 25% (i.e., ratio>1.25 or <1/1.25) shown in larger red dots. The coefficients shown are only those that represent a change larger than 25% for at least one performance measure, or coefficients that are marginal to those shown (e.g., any main effect from an interaction that includes it).
Mentions: Figures 2 and 3 show the coefficients from the GLMM fits. From Figure 2 we see that DiP and DiP-A only performed better than the average of methods with respect to performance measure FPF (which, as mentioned before, is of minor value compared to PND and PFD), and CBN and CBN-A only with respect to performance measure PND. However, for performance measure PND the better performance of CBN/CBN-A compared to other methods was concentrated in graphs with conjunctions. The left column for PND in Table 3 shows that the best five methods were all CBN/CBN-A, but the right column for PND shows that OT-A occupies the first three and fifth positions. The analysis of frequencies of confidence sets, in Table 4, again reveals the same patterns: OT and OT-A were clearly the best methods for performance measures Diff and PFD, and were best methods with DiP/DiP-A for performance measure FPF (again, FPF is of minor relevance compared to PND and PFD). CBN and CBN-A were in confidence sets that did not include any of the other methods in 67% of the cases for performance measure PND in graphs with conjunction. In the absence of conjunctions, however, confidence sets that did not include CBN/CBN-A were more prevalent than those that included CBN/CBN-A. That the best performance of OT/OT-A in graphs with conjunctions cannot be perfect should be expected, and we should only see perfect performance in these cases, if at all, with CBN/CBN-A or DiP/DiP-A. Two extreme cases (which also provide an internal consistency check) are graphs “7-A” and “11-A” (both have conjunctions): perfect performance was achieved for the first with CBN-A and for the second with DiP (S.Size = 1000, McF_6, sh Inf and 0, S.Time unif and last, respectively —see Additional file 6).Figure 2

Bottom Line: This allowed me to assess, for the first time, the effects of having to filter out passengers, of sampling schemes (when, how, and how many samples), and of deviations from order restrictions.Having to filter passengers lead to decreased performance, especially because true restrictions were missed.Single cell sampling provided no advantage, but sampling in the final stages of the disease vs. sampling at different stages had severe effects.

View Article: PubMed Central - PubMed

Affiliation: Dept. Biochemistry, Universidad Autónoma de Madrid, Instituto de Investigaciones Biomédicas "Alberto Sols" (UAM-CSIC), Arzobispo Morcillo, 4, 28029, Madrid, Spain. ramon.diaz@iib.uam.es.

ABSTRACT

Background: Cancer progression is caused by the sequential accumulation of mutations, but not all orders of accumulation are equally likely. When the fixation of some mutations depends on the presence of previous ones, identifying restrictions in the order of accumulation of mutations can lead to the discovery of therapeutic targets and diagnostic markers. The purpose of this study is to conduct a comprehensive comparison of the performance of all available methods to identify these restrictions from cross-sectional data. I used simulated data sets (where the true restrictions are known) but, in contrast to previous work, I embedded restrictions within evolutionary models of tumor progression that included passengers (mutations not responsible for the development of cancer, known to be very common). This allowed me to assess, for the first time, the effects of having to filter out passengers, of sampling schemes (when, how, and how many samples), and of deviations from order restrictions.

Results: Poor choices of method, filtering, and sampling lead to large errors in all performance measures. Having to filter passengers lead to decreased performance, especially because true restrictions were missed. Overall, the best method for identifying order restrictions were Oncogenetic Trees, a fast and easy to use method that, although unable to recover dependencies of mutations on more than one mutation, showed good performance in most scenarios, superior to Conjunctive Bayesian Networks and Progression Networks. Single cell sampling provided no advantage, but sampling in the final stages of the disease vs. sampling at different stages had severe effects. Evolutionary model and deviations from order restrictions had major, and sometimes counterintuitive, interactions with other factors that affected performance.

Conclusions: This paper provides practical recommendations for using these methods with experimental data. It also identifies key areas of future methodological work and, in particular, it shows that it is both possible and necessary to embed assumptions about order restrictions and the nature of driver status within evolutionary models of cancer progression to evaluate the performance of inferential approaches.

Show MeSH
Related in: MedlinePlus