Limits...
PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors.

Deshwar AG, Vembu S, Yung CK, Jang GH, Stein L, Morris Q - Genome Biol. (2015)

Bottom Line: Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations.We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations.We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods.

View Article: PubMed Central - PubMed

ABSTRACT
Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations. We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations. We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods. PhyloWGS is free, open-source software, available at https://github.com/morrislab/phylowgs.

Show MeSH

Related in: MedlinePlus

Expert-generated and inferred phylogenies for patient CLL077 with chronic lymphocytic leukemia. Left: The expert-generated phylogeny based on targeted deep-sequencing data. Right: The phylogeny inferred by PhyloWGS on allele frequencies of the same SSMs found using WGS. The subclonal lineage population frequencies for the five samples and the SSM assignments of lineages are also shown in the figure. SSM, simple somatic mutation; WGS, whole-genome sequencing.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4359439&req=5

Fig10: Expert-generated and inferred phylogenies for patient CLL077 with chronic lymphocytic leukemia. Left: The expert-generated phylogeny based on targeted deep-sequencing data. Right: The phylogeny inferred by PhyloWGS on allele frequencies of the same SSMs found using WGS. The subclonal lineage population frequencies for the five samples and the SSM assignments of lineages are also shown in the figure. SSM, simple somatic mutation; WGS, whole-genome sequencing.

Mentions: Next, we applied PhyloWGS to data from patient CLL077 extracted from Supplementary Table 7 from a paper describing a chronic lymphocytic leukemia dataset [11] (available as accession [EGAD:00001000972]). For this patient, five tumor samples were collected over the course of treatment. We note that our method does not assume or use any temporal relationships in multiple sample data and could equally be applied to multiple samples collected simultaneously. We have previously reported experiments using the targeted resequencing data with an average read depth of 100,000 × at 17 identified SSMs [17]; instead we now use the data from WGS for that same set of mutations, with average read depth of 40 ×. By examining the number of reference and variant alleles it was clear that the mutation in gene SAMHD1 was at a location that was homozygous in the cancerous subpopulation it was part of. This is because the proportion of variant reads was far above 50% (the expected variant allele proportion for a heterozygous SSM present in every cell of the sample). We simulated the data that a CNV algorithm would find by assuming that the copy number at that location was one in a CNV-defined subpopulation and that the proportion of cells in that population was the same as implied by halving the proportion of variant alleles. After running PhyloWGS on these data, we compared the maximum data likelihood tree with the expert-generated tree found using a semi-manual method and targeted resequencing data (Figure 10). The two trees are nearly identical with the exception of assigning a single SSM to a child of the subpopulation where it is found in the expert tree. In Additional file 2, we show the top 50 sampled trees, ranked based on their posterior probabilities.Figure 10


PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors.

Deshwar AG, Vembu S, Yung CK, Jang GH, Stein L, Morris Q - Genome Biol. (2015)

Expert-generated and inferred phylogenies for patient CLL077 with chronic lymphocytic leukemia. Left: The expert-generated phylogeny based on targeted deep-sequencing data. Right: The phylogeny inferred by PhyloWGS on allele frequencies of the same SSMs found using WGS. The subclonal lineage population frequencies for the five samples and the SSM assignments of lineages are also shown in the figure. SSM, simple somatic mutation; WGS, whole-genome sequencing.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4359439&req=5

Fig10: Expert-generated and inferred phylogenies for patient CLL077 with chronic lymphocytic leukemia. Left: The expert-generated phylogeny based on targeted deep-sequencing data. Right: The phylogeny inferred by PhyloWGS on allele frequencies of the same SSMs found using WGS. The subclonal lineage population frequencies for the five samples and the SSM assignments of lineages are also shown in the figure. SSM, simple somatic mutation; WGS, whole-genome sequencing.
Mentions: Next, we applied PhyloWGS to data from patient CLL077 extracted from Supplementary Table 7 from a paper describing a chronic lymphocytic leukemia dataset [11] (available as accession [EGAD:00001000972]). For this patient, five tumor samples were collected over the course of treatment. We note that our method does not assume or use any temporal relationships in multiple sample data and could equally be applied to multiple samples collected simultaneously. We have previously reported experiments using the targeted resequencing data with an average read depth of 100,000 × at 17 identified SSMs [17]; instead we now use the data from WGS for that same set of mutations, with average read depth of 40 ×. By examining the number of reference and variant alleles it was clear that the mutation in gene SAMHD1 was at a location that was homozygous in the cancerous subpopulation it was part of. This is because the proportion of variant reads was far above 50% (the expected variant allele proportion for a heterozygous SSM present in every cell of the sample). We simulated the data that a CNV algorithm would find by assuming that the copy number at that location was one in a CNV-defined subpopulation and that the proportion of cells in that population was the same as implied by halving the proportion of variant alleles. After running PhyloWGS on these data, we compared the maximum data likelihood tree with the expert-generated tree found using a semi-manual method and targeted resequencing data (Figure 10). The two trees are nearly identical with the exception of assigning a single SSM to a child of the subpopulation where it is found in the expert tree. In Additional file 2, we show the top 50 sampled trees, ranked based on their posterior probabilities.Figure 10

Bottom Line: Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations.We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations.We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods.

View Article: PubMed Central - PubMed

ABSTRACT
Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations. We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations. We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods. PhyloWGS is free, open-source software, available at https://github.com/morrislab/phylowgs.

Show MeSH
Related in: MedlinePlus