Limits...
Methods and challenges in timing chromosomal abnormalities within cancer samples.

Purdom E, Ho C, Grasso CS, Quist MJ, Cho RJ, Spellman P - Bioinformatics (2013)

Bottom Line: We show that the model for timing chromosomal amplifications is limited in scope, particularly for regions with high levels of amplification.We propose a maximum-likelihood estimation procedure that fully accounts for sequencing variability and show that it outperforms the partial maximum-likelihood estimation method.We also propose a Bayesian estimation procedure that stabilizes the estimates in certain settings.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, University of California, Berkeley, 367 Evans Hall Berkeley, CA 94720-3860, USA, Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR 97239, USA and Department of Dermatology, University of California, San Francisco, CA 94115, USA.

ABSTRACT

Motivation: Tumors acquire many chromosomal amplifications, and those acquired early in the lifespan of the tumor may be not only important for tumor growth but also can be used for diagnostic purposes. Many methods infer the order of the accumulation of abnormalities based on their occurrence in a large cohort of patients. Recently, Durinck et al. (2011) and Greenman et al. (2012) developed methods to order a single tumor's chromosomal amplifications based on the patterns of mutations accumulated within those regions. This method offers an unprecedented opportunity to assess the etiology of a single tumor sample, but has not been widely evaluated.

Results: We show that the model for timing chromosomal amplifications is limited in scope, particularly for regions with high levels of amplification. We also show that the estimation of the order of events can be sensitive for events that occur early in the progression of the tumor and that the partial maximum likelihood method of Greenman et al. (2012) can give biased estimates, particularly for moderate read coverage or normal contamination. We propose a maximum-likelihood estimation procedure that fully accounts for sequencing variability and show that it outperforms the partial maximum-likelihood estimation method. We also propose a Bayesian estimation procedure that stabilizes the estimates in certain settings. We implement these methods on a small number of ovarian tumors, and the results suggest possible differences in how the tumors acquired amplifications.

Availability and implementation: We provide implementation of these methods in an R package cancerTiming, which is available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/.

Show MeSH

Related in: MedlinePlus

Boxplots of  based on simulated data for values of  in the single-gain case. (a) Read depth of 30× and no normal contamination. (b) Read depth of 75× and 30% normal contamination. The number of mutations, N, was fixed at 125
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3842754&req=5

btt546-F2: Boxplots of based on simulated data for values of in the single-gain case. (a) Read depth of 30× and no normal contamination. (b) Read depth of 75× and 30% normal contamination. The number of mutations, N, was fixed at 125

Mentions: Full Maximum Likelihood We expect that the difference between the partial MLE method of Greenman et al. (2012) and our full MLE method will be the largest when the question of classifying mutated locations to a particular allele frequency has the greatest uncertainty: lower read coverage and/or higher levels of normal contamination. Simulation results show that with no normal contamination, the partial MLE method can be biased even in the relatively simple case of the single-gain case with read coverage as high as 30× (Fig. 2). By 75× coverage the two methods are indistinguishable for low numbers of events, but for larger K, the partial MLE still remains biased even with 75× coverage, see Supplementary Figure S8. In particular, the partial MLE method overestimates for small , and conversely for large . Even where the full MLE tends to be biased and underestimates , the partial MLE goes the other direction and overestimates by a larger margin (and conversely for large ), resulting in worse average error, Figure 3. When normal contamination increases, the partial MLE does worse, so that even for 75× coverage and K = 1, estimation of moderately low values of (e.g. ) is noticeably still biased (Figs 2 and 3). For large K, where the allele frequencies are closer together and harder to distinguish, the problems are magnified across a wide spectrum of and larger coverage is required before the bias disappears, see Supplementary Figures S6.Fig. 2.


Methods and challenges in timing chromosomal abnormalities within cancer samples.

Purdom E, Ho C, Grasso CS, Quist MJ, Cho RJ, Spellman P - Bioinformatics (2013)

Boxplots of  based on simulated data for values of  in the single-gain case. (a) Read depth of 30× and no normal contamination. (b) Read depth of 75× and 30% normal contamination. The number of mutations, N, was fixed at 125
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3842754&req=5

btt546-F2: Boxplots of based on simulated data for values of in the single-gain case. (a) Read depth of 30× and no normal contamination. (b) Read depth of 75× and 30% normal contamination. The number of mutations, N, was fixed at 125
Mentions: Full Maximum Likelihood We expect that the difference between the partial MLE method of Greenman et al. (2012) and our full MLE method will be the largest when the question of classifying mutated locations to a particular allele frequency has the greatest uncertainty: lower read coverage and/or higher levels of normal contamination. Simulation results show that with no normal contamination, the partial MLE method can be biased even in the relatively simple case of the single-gain case with read coverage as high as 30× (Fig. 2). By 75× coverage the two methods are indistinguishable for low numbers of events, but for larger K, the partial MLE still remains biased even with 75× coverage, see Supplementary Figure S8. In particular, the partial MLE method overestimates for small , and conversely for large . Even where the full MLE tends to be biased and underestimates , the partial MLE goes the other direction and overestimates by a larger margin (and conversely for large ), resulting in worse average error, Figure 3. When normal contamination increases, the partial MLE does worse, so that even for 75× coverage and K = 1, estimation of moderately low values of (e.g. ) is noticeably still biased (Figs 2 and 3). For large K, where the allele frequencies are closer together and harder to distinguish, the problems are magnified across a wide spectrum of and larger coverage is required before the bias disappears, see Supplementary Figures S6.Fig. 2.

Bottom Line: We show that the model for timing chromosomal amplifications is limited in scope, particularly for regions with high levels of amplification.We propose a maximum-likelihood estimation procedure that fully accounts for sequencing variability and show that it outperforms the partial maximum-likelihood estimation method.We also propose a Bayesian estimation procedure that stabilizes the estimates in certain settings.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, University of California, Berkeley, 367 Evans Hall Berkeley, CA 94720-3860, USA, Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR 97239, USA and Department of Dermatology, University of California, San Francisco, CA 94115, USA.

ABSTRACT

Motivation: Tumors acquire many chromosomal amplifications, and those acquired early in the lifespan of the tumor may be not only important for tumor growth but also can be used for diagnostic purposes. Many methods infer the order of the accumulation of abnormalities based on their occurrence in a large cohort of patients. Recently, Durinck et al. (2011) and Greenman et al. (2012) developed methods to order a single tumor's chromosomal amplifications based on the patterns of mutations accumulated within those regions. This method offers an unprecedented opportunity to assess the etiology of a single tumor sample, but has not been widely evaluated.

Results: We show that the model for timing chromosomal amplifications is limited in scope, particularly for regions with high levels of amplification. We also show that the estimation of the order of events can be sensitive for events that occur early in the progression of the tumor and that the partial maximum likelihood method of Greenman et al. (2012) can give biased estimates, particularly for moderate read coverage or normal contamination. We propose a maximum-likelihood estimation procedure that fully accounts for sequencing variability and show that it outperforms the partial maximum-likelihood estimation method. We also propose a Bayesian estimation procedure that stabilizes the estimates in certain settings. We implement these methods on a small number of ovarian tumors, and the results suggest possible differences in how the tumors acquired amplifications.

Availability and implementation: We provide implementation of these methods in an R package cancerTiming, which is available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/.

Show MeSH
Related in: MedlinePlus