Limits...
PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors.

Deshwar AG, Vembu S, Yung CK, Jang GH, Stein L, Morris Q - Genome Biol. (2015)

Bottom Line: Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations.We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations.We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods.

View Article: PubMed Central - PubMed

ABSTRACT
Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations. We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations. We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods. PhyloWGS is free, open-source software, available at https://github.com/morrislab/phylowgs.

Show MeSH

Related in: MedlinePlus

True and inferred composition of TCGA benchmark samples. The figure shows the true (left), inferred by PhyloSub (center) and inferred by THetA (right) composition of three TCGA benchmark samples. Each bar represents a single sample.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4359439&req=5

Fig9: True and inferred composition of TCGA benchmark samples. The figure shows the true (left), inferred by PhyloSub (center) and inferred by THetA (right) composition of three TCGA benchmark samples. Each bar represents a single sample.

Mentions: Next, we applied PhyloWGS to the TCGA variant-calling benchmark 4 dataset [41]. The samples we examined consist of a normal population, a cancerous cell-line population HCC 1143 and a spiked-in subclonal descendant of the cancerous population in various proportions with 30 × coverage. Starting with the publicly available BAM files, we identified locations of possible structural variation using BIC-seq [37] with default parameters, except for the bandwidth parameter, which was set to 1,000. We changed the bandwidth parameter because we found the default value of 100 resulted in overly noisy segmentations and highly variable normalized read counts. To identify SSMs and the number of variant and reference reads for each SSM, we reverted the BAM files into unaligned reads using Picard 1.90 [42]. Reads for each sample were then realigned using BWA 0.6.2 [43] and collapsed using Picard. Aligned reads of a cancerous sample and its matched normal were analyzed by two somatic calling tools: MuTect 1.1.4 [44] and Strelka 1.0.7 [45]. A set of high confidence mutations were extracted by taking an intersection of the calls made by MuTect and Strelka. Previous verification with other tumor/normal pairs showed that this approach achieved >90% precision (data not shown). We first ran THetA [15] using the output of BIC-seq with the aim of using THetA’s output to provide us with the CNV information that PhyloWGS requires (see Materials and methods section). However, despite that the subclonal population varied from 40% to 10%, THetA returned nearly identical composition inferences for all the samples (see Figure 9). Because of this, we decided that we could not rely on THetA’s copy number calls, so we instead simply removed all SSMs in a location where BIC-seq identified possible structural variation. This eliminated most of the SSMs identified, leaving only 62 SSMs from the original 4,344. Despite this small number of SSMs, our algorithm was still able to identify the correct number of populations and captured the changing composition of the samples. Also, the inferred SSM content of each cluster was identical in the three separate runs.Figure 9


PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors.

Deshwar AG, Vembu S, Yung CK, Jang GH, Stein L, Morris Q - Genome Biol. (2015)

True and inferred composition of TCGA benchmark samples. The figure shows the true (left), inferred by PhyloSub (center) and inferred by THetA (right) composition of three TCGA benchmark samples. Each bar represents a single sample.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4359439&req=5

Fig9: True and inferred composition of TCGA benchmark samples. The figure shows the true (left), inferred by PhyloSub (center) and inferred by THetA (right) composition of three TCGA benchmark samples. Each bar represents a single sample.
Mentions: Next, we applied PhyloWGS to the TCGA variant-calling benchmark 4 dataset [41]. The samples we examined consist of a normal population, a cancerous cell-line population HCC 1143 and a spiked-in subclonal descendant of the cancerous population in various proportions with 30 × coverage. Starting with the publicly available BAM files, we identified locations of possible structural variation using BIC-seq [37] with default parameters, except for the bandwidth parameter, which was set to 1,000. We changed the bandwidth parameter because we found the default value of 100 resulted in overly noisy segmentations and highly variable normalized read counts. To identify SSMs and the number of variant and reference reads for each SSM, we reverted the BAM files into unaligned reads using Picard 1.90 [42]. Reads for each sample were then realigned using BWA 0.6.2 [43] and collapsed using Picard. Aligned reads of a cancerous sample and its matched normal were analyzed by two somatic calling tools: MuTect 1.1.4 [44] and Strelka 1.0.7 [45]. A set of high confidence mutations were extracted by taking an intersection of the calls made by MuTect and Strelka. Previous verification with other tumor/normal pairs showed that this approach achieved >90% precision (data not shown). We first ran THetA [15] using the output of BIC-seq with the aim of using THetA’s output to provide us with the CNV information that PhyloWGS requires (see Materials and methods section). However, despite that the subclonal population varied from 40% to 10%, THetA returned nearly identical composition inferences for all the samples (see Figure 9). Because of this, we decided that we could not rely on THetA’s copy number calls, so we instead simply removed all SSMs in a location where BIC-seq identified possible structural variation. This eliminated most of the SSMs identified, leaving only 62 SSMs from the original 4,344. Despite this small number of SSMs, our algorithm was still able to identify the correct number of populations and captured the changing composition of the samples. Also, the inferred SSM content of each cluster was identical in the three separate runs.Figure 9

Bottom Line: Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations.We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations.We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods.

View Article: PubMed Central - PubMed

ABSTRACT
Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations. We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations. We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods. PhyloWGS is free, open-source software, available at https://github.com/morrislab/phylowgs.

Show MeSH
Related in: MedlinePlus