Limits...
PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors.

Deshwar AG, Vembu S, Yung CK, Jang GH, Stein L, Morris Q - Genome Biol. (2015)

Bottom Line: Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations.We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations.We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods.

View Article: PubMed Central - PubMed

ABSTRACT
Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations. We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations. We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods. PhyloWGS is free, open-source software, available at https://github.com/morrislab/phylowgs.

Show MeSH

Related in: MedlinePlus

Recovering the true number of clusters. Each panel shows the relationship between the number of SSMs per cluster, the read depth and the ability of PhyloWGS to recover the true number of populations for simulations with three, four, five or six populations. The error is calculated by subtracting the true number of subclonal lineages from the number found. SSM, simple somatic mutation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4359439&req=5

Fig6: Recovering the true number of clusters. Each panel shows the relationship between the number of SSMs per cluster, the read depth and the ability of PhyloWGS to recover the true number of populations for simulations with three, four, five or six populations. The error is calculated by subtracting the true number of subclonal lineages from the number found. SSM, simple somatic mutation.

Mentions: To determine the number of subpopulations our algorithm found, we analyzed the sampled tree with the highest complete data likelihood and removed any subpopulations with zero assigned SSMs. We then compared the difference between the number of subpopulations used to generate the data and the number of subpopulations identified by our algorithm. The results of this comparison for ambiguous phylogeny simulations are shown in Figure 6. Several relationships between simulation parameters and the output of our model can be observed. First, unsurprisingly, increasing the read depth and decreasing the number of subpopulations resulted in increased accuracy in the estimated number of subpopulations. Second, for the ambiguous phylogeny simulations, there is a U-shaped relationship between accuracy and the number of SSMs characterizing each population, where accuracy first increases and then decreases as the number of SSMs increases. This decrease in accuracy with high numbers of SSMs is unintuitive, since more SSMs provide more information with which to perform inference. However, the Dirichlet process prior sometimes overestimates the number of source components [39]. While this overestimation has not been demonstrated for the tree-structured stick-breaking process prior used by PhyloWGS, the similarity between the processes makes it likely that this is the case. While some of these errors can be eliminated by ad hoc removal of clusters with a small number of SSMs, there is not yet a consistent approach to do this, so we leave the results untouched. These results suggest that for three or four subpopulations, a read depth consistent with typical WGS experiments (20 to 30 ×) is sufficient to identify the correct number of subpopulations, while experiments with 200 to 300 × are needed to resolve tumors with up to six subpopulations.Figure 6


PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors.

Deshwar AG, Vembu S, Yung CK, Jang GH, Stein L, Morris Q - Genome Biol. (2015)

Recovering the true number of clusters. Each panel shows the relationship between the number of SSMs per cluster, the read depth and the ability of PhyloWGS to recover the true number of populations for simulations with three, four, five or six populations. The error is calculated by subtracting the true number of subclonal lineages from the number found. SSM, simple somatic mutation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4359439&req=5

Fig6: Recovering the true number of clusters. Each panel shows the relationship between the number of SSMs per cluster, the read depth and the ability of PhyloWGS to recover the true number of populations for simulations with three, four, five or six populations. The error is calculated by subtracting the true number of subclonal lineages from the number found. SSM, simple somatic mutation.
Mentions: To determine the number of subpopulations our algorithm found, we analyzed the sampled tree with the highest complete data likelihood and removed any subpopulations with zero assigned SSMs. We then compared the difference between the number of subpopulations used to generate the data and the number of subpopulations identified by our algorithm. The results of this comparison for ambiguous phylogeny simulations are shown in Figure 6. Several relationships between simulation parameters and the output of our model can be observed. First, unsurprisingly, increasing the read depth and decreasing the number of subpopulations resulted in increased accuracy in the estimated number of subpopulations. Second, for the ambiguous phylogeny simulations, there is a U-shaped relationship between accuracy and the number of SSMs characterizing each population, where accuracy first increases and then decreases as the number of SSMs increases. This decrease in accuracy with high numbers of SSMs is unintuitive, since more SSMs provide more information with which to perform inference. However, the Dirichlet process prior sometimes overestimates the number of source components [39]. While this overestimation has not been demonstrated for the tree-structured stick-breaking process prior used by PhyloWGS, the similarity between the processes makes it likely that this is the case. While some of these errors can be eliminated by ad hoc removal of clusters with a small number of SSMs, there is not yet a consistent approach to do this, so we leave the results untouched. These results suggest that for three or four subpopulations, a read depth consistent with typical WGS experiments (20 to 30 ×) is sufficient to identify the correct number of subpopulations, while experiments with 200 to 300 × are needed to resolve tumors with up to six subpopulations.Figure 6

Bottom Line: Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations.We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations.We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods.

View Article: PubMed Central - PubMed

ABSTRACT
Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations. We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations. We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods. PhyloWGS is free, open-source software, available at https://github.com/morrislab/phylowgs.

Show MeSH
Related in: MedlinePlus