Limits...
Reconstruction of clonal trees and tumor composition from multi-sample sequencing data.

El-Kebir M, Oesper L, Acheson-Field H, Raphael BJ - Bioinformatics (2015)

Bottom Line: We derive a combinatorial characterization of the solutions to this problem and show that the problem is NP-complete.We derive an integer linear programming solution to the VAF factorization problem in the case of error-free data and extend this solution to real data with a probabilistic model for errors.The resulting AncesTree algorithm is better able to identify ancestral relationships between individual mutations than existing approaches, particularly in ultra-deep sequencing data when high read counts for mutations yield high confidence VAFs.

View Article: PubMed Central - PubMed

Affiliation: Center for Computational Molecular Biology and Department of Computer Science, Brown University, Providence, RI 02912, USA.

Show MeSH

Related in: MedlinePlus

Comparison of whole-exome (top) and deep sequencing data (bottom) for lung patient 330. (A) Histogram of observed VAFs for all mutations for both datatypes does not reveal a significant difference between lower (201X) coverage (top) and higher (674X) coverage (bottom) sequencing data. (B) A heat map showing the posterior probability  for all pairs of mutations i and j. The asymmetry in the matrix reveals high confidence ancestral relationships, which become much clearer with higher coverage. (C) The posterior distribution of the VAF for three mutations given the observed read counts. In higher coverage data, the distributions become much tighter, revealing that the red mutation is ancestral to the blue and green mutations
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4542783&req=5

btv261-F4: Comparison of whole-exome (top) and deep sequencing data (bottom) for lung patient 330. (A) Histogram of observed VAFs for all mutations for both datatypes does not reveal a significant difference between lower (201X) coverage (top) and higher (674X) coverage (bottom) sequencing data. (B) A heat map showing the posterior probability for all pairs of mutations i and j. The asymmetry in the matrix reveals high confidence ancestral relationships, which become much clearer with higher coverage. (C) The posterior distribution of the VAF for three mutations given the observed read counts. In higher coverage data, the distributions become much tighter, revealing that the red mutation is ancestral to the blue and green mutations

Mentions: A key difference between AncesTree and other approaches is that we use a graph clustering approach to group mutations by their putative ancestral relationships across all samples, rather than clustering VAFs directly. We demonstrate the advantages of this approach on a lung tumor [patient 330 in Zhang et al. (2014)] that had multiple samples sequenced using both whole-exome and targeted deep sequencing (higher coverage) data. One would expect that deep sequencing data should provide a more accurate measurements of the VAF for each mutation due to the higher read counts. However, in aggregate, there is very little difference between the VAF histograms for whole-exome versus deep sequencing (Fig. 4A). Thus, methods that first cluster mutations according to their VAF without considering the variance in the VAFs of individual mutations from the observed read counts, including CITUP (Malikic et al., 2015) and LICHeE (Popic et al., 2014), will not recognize differences in clustering between the low and high coverage data.Fig. 4.


Reconstruction of clonal trees and tumor composition from multi-sample sequencing data.

El-Kebir M, Oesper L, Acheson-Field H, Raphael BJ - Bioinformatics (2015)

Comparison of whole-exome (top) and deep sequencing data (bottom) for lung patient 330. (A) Histogram of observed VAFs for all mutations for both datatypes does not reveal a significant difference between lower (201X) coverage (top) and higher (674X) coverage (bottom) sequencing data. (B) A heat map showing the posterior probability  for all pairs of mutations i and j. The asymmetry in the matrix reveals high confidence ancestral relationships, which become much clearer with higher coverage. (C) The posterior distribution of the VAF for three mutations given the observed read counts. In higher coverage data, the distributions become much tighter, revealing that the red mutation is ancestral to the blue and green mutations
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4542783&req=5

btv261-F4: Comparison of whole-exome (top) and deep sequencing data (bottom) for lung patient 330. (A) Histogram of observed VAFs for all mutations for both datatypes does not reveal a significant difference between lower (201X) coverage (top) and higher (674X) coverage (bottom) sequencing data. (B) A heat map showing the posterior probability for all pairs of mutations i and j. The asymmetry in the matrix reveals high confidence ancestral relationships, which become much clearer with higher coverage. (C) The posterior distribution of the VAF for three mutations given the observed read counts. In higher coverage data, the distributions become much tighter, revealing that the red mutation is ancestral to the blue and green mutations
Mentions: A key difference between AncesTree and other approaches is that we use a graph clustering approach to group mutations by their putative ancestral relationships across all samples, rather than clustering VAFs directly. We demonstrate the advantages of this approach on a lung tumor [patient 330 in Zhang et al. (2014)] that had multiple samples sequenced using both whole-exome and targeted deep sequencing (higher coverage) data. One would expect that deep sequencing data should provide a more accurate measurements of the VAF for each mutation due to the higher read counts. However, in aggregate, there is very little difference between the VAF histograms for whole-exome versus deep sequencing (Fig. 4A). Thus, methods that first cluster mutations according to their VAF without considering the variance in the VAFs of individual mutations from the observed read counts, including CITUP (Malikic et al., 2015) and LICHeE (Popic et al., 2014), will not recognize differences in clustering between the low and high coverage data.Fig. 4.

Bottom Line: We derive a combinatorial characterization of the solutions to this problem and show that the problem is NP-complete.We derive an integer linear programming solution to the VAF factorization problem in the case of error-free data and extend this solution to real data with a probabilistic model for errors.The resulting AncesTree algorithm is better able to identify ancestral relationships between individual mutations than existing approaches, particularly in ultra-deep sequencing data when high read counts for mutations yield high confidence VAFs.

View Article: PubMed Central - PubMed

Affiliation: Center for Computational Molecular Biology and Department of Computer Science, Brown University, Providence, RI 02912, USA.

Show MeSH
Related in: MedlinePlus