Limits...
PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors.

Deshwar AG, Vembu S, Yung CK, Jang GH, Stein L, Morris Q - Genome Biol. (2015)

Bottom Line: Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations.We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations.We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods.

View Article: PubMed Central - PubMed

ABSTRACT
Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations. We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations. We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods. PhyloWGS is free, open-source software, available at https://github.com/morrislab/phylowgs.

Show MeSH

Related in: MedlinePlus

PhyloWGS run time. Relationship between the number of SSMs in the simulated dataset with five subpopulations and the run time on a log10 vs log10 plot. Run time was measured using a single core of an Intel i7-4770K with 2,500 MCMC iterations and 5,000 inner Metropolis–Hastings iterations. The run time can be greatly decreased by parallelizing the sampling or by taking less samples; however, the implications of these options have not been explored. SSM, simple somatic mutation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4359439&req=5

Fig5: PhyloWGS run time. Relationship between the number of SSMs in the simulated dataset with five subpopulations and the run time on a log10 vs log10 plot. Run time was measured using a single core of an Intel i7-4770K with 2,500 MCMC iterations and 5,000 inner Metropolis–Hastings iterations. The run time can be greatly decreased by parallelizing the sampling or by taking less samples; however, the implications of these options have not been explored. SSM, simple somatic mutation.

Mentions: An important question in subclonal analysis of tumor samples is estimating how deep sequencing must be to recover the subclonal structure. To answer this question, we applied PhyloWGS to simulated read counts with known subclonal structure. Our simulations looked at a range of total population counts (3, 4, 5 and 6), read depths (20, 30, 50, 70, 100, 200 and 300) and number of SSMs per population (5, 10, 25, 50, 100, 200, 500 and 1,000). For each combination of population count, read depth and SSMs per population, we generated simulated tumor data for which the subclonal population frequencies were consistent with both branching and linear phylogenies. For each simulated SSM k in subpopulation u, reference allele reads (ak) were drawn as: \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ \begin{aligned} a_{k} \sim \text{Binomial}(d_{k}, 1-\phi_{u} + 0.5 \phi_{u}); \quad d_{k} \sim \text{Poisson}(r), \end{aligned} $$ \end{document}ak∼Binomial(dk,1−ϕu+0.5ϕu);dk∼Poisson(r), where ϕu is the clonal frequency of population u and r is the simulated read depth. The ϕ values used for the simulations can be found in Table 3. First, we examined the time needed to complete sampling as a function of the number of SSMs (shown in Figure 5). In less than 3 hours on a single core of an Intel i7-4770K, on average, the inference could be completed with up to 1,000 SSMs (all timing data shown use the simulated dataset with five subpopulations).Figure 5


PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors.

Deshwar AG, Vembu S, Yung CK, Jang GH, Stein L, Morris Q - Genome Biol. (2015)

PhyloWGS run time. Relationship between the number of SSMs in the simulated dataset with five subpopulations and the run time on a log10 vs log10 plot. Run time was measured using a single core of an Intel i7-4770K with 2,500 MCMC iterations and 5,000 inner Metropolis–Hastings iterations. The run time can be greatly decreased by parallelizing the sampling or by taking less samples; however, the implications of these options have not been explored. SSM, simple somatic mutation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4359439&req=5

Fig5: PhyloWGS run time. Relationship between the number of SSMs in the simulated dataset with five subpopulations and the run time on a log10 vs log10 plot. Run time was measured using a single core of an Intel i7-4770K with 2,500 MCMC iterations and 5,000 inner Metropolis–Hastings iterations. The run time can be greatly decreased by parallelizing the sampling or by taking less samples; however, the implications of these options have not been explored. SSM, simple somatic mutation.
Mentions: An important question in subclonal analysis of tumor samples is estimating how deep sequencing must be to recover the subclonal structure. To answer this question, we applied PhyloWGS to simulated read counts with known subclonal structure. Our simulations looked at a range of total population counts (3, 4, 5 and 6), read depths (20, 30, 50, 70, 100, 200 and 300) and number of SSMs per population (5, 10, 25, 50, 100, 200, 500 and 1,000). For each combination of population count, read depth and SSMs per population, we generated simulated tumor data for which the subclonal population frequencies were consistent with both branching and linear phylogenies. For each simulated SSM k in subpopulation u, reference allele reads (ak) were drawn as: \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ \begin{aligned} a_{k} \sim \text{Binomial}(d_{k}, 1-\phi_{u} + 0.5 \phi_{u}); \quad d_{k} \sim \text{Poisson}(r), \end{aligned} $$ \end{document}ak∼Binomial(dk,1−ϕu+0.5ϕu);dk∼Poisson(r), where ϕu is the clonal frequency of population u and r is the simulated read depth. The ϕ values used for the simulations can be found in Table 3. First, we examined the time needed to complete sampling as a function of the number of SSMs (shown in Figure 5). In less than 3 hours on a single core of an Intel i7-4770K, on average, the inference could be completed with up to 1,000 SSMs (all timing data shown use the simulated dataset with five subpopulations).Figure 5

Bottom Line: Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations.We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations.We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods.

View Article: PubMed Central - PubMed

ABSTRACT
Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations. We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations. We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods. PhyloWGS is free, open-source software, available at https://github.com/morrislab/phylowgs.

Show MeSH
Related in: MedlinePlus