Limits...
Reconstruction of clonal trees and tumor composition from multi-sample sequencing data.

El-Kebir M, Oesper L, Acheson-Field H, Raphael BJ - Bioinformatics (2015)

Bottom Line: We derive a combinatorial characterization of the solutions to this problem and show that the problem is NP-complete.We derive an integer linear programming solution to the VAF factorization problem in the case of error-free data and extend this solution to real data with a probabilistic model for errors.The resulting AncesTree algorithm is better able to identify ancestral relationships between individual mutations than existing approaches, particularly in ultra-deep sequencing data when high read counts for mutations yield high confidence VAFs.

View Article: PubMed Central - PubMed

Affiliation: Center for Computational Molecular Biology and Department of Computer Science, Brown University, Providence, RI 02912, USA.

Show MeSH

Related in: MedlinePlus

Violin plots comparing AncesTree, PhyloSub and CITUP on simulated data. (A) Accuracy of each method in predicting when mutations are ancestral to each other or (B) clustered in the same population. (C) Error in the inferred VAF fpj and (D) usage values upj. Median values are indicated below each algorithm
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4542783&req=5

btv261-F3: Violin plots comparing AncesTree, PhyloSub and CITUP on simulated data. (A) Accuracy of each method in predicting when mutations are ancestral to each other or (B) clustered in the same population. (C) Error in the inferred VAF fpj and (D) usage values upj. Median values are indicated below each algorithm

Mentions: We created 90 synthetic tumor datasets. Each dataset contains 100 mutations grouped into 10 clones that accumulated following the infinite sites assumption. For each dataset, we simulated between four and six samples sequenced at a coverage of 50X, 100X or 1000X. Further details of the simulated data are contained in the Supplementary Appendix. We ran AncesTree, PhyloSub and CITUP on each dataset and compared the results using five measures: (i) the fraction of ancestral relationships between pairs of mutations that were correctly identified (Fig. 3A); (ii) the fraction of clustered relationships between pairs of mutations that were correctly identified (Fig. 3B); (iii) the fraction of incomparable relationships (i.e. neither ancestral nor clustered) between pairs of mutations that were correctly identified (Supplementary Fig. A2); (iv) the average error between the simulated and inferred frequency matrix F (Fig. 3C) and (v) the error between the simulated usage matrix and the inferred usage U using the same metric as Malikic et al. (2015) (Fig. 3D). We note that we compute these measures only on the set of mutations that are included in the output of all methods, which equates to the set of mutations output by AncesTree (median of 69 of the 100 total mutations) since CITUP and PhyloSub include all mutations. We find that AncesTree has higher accuracy in determining ancestral, clustered and incomparable relationships with median accuracy more than 0.05, 0.03 and 0.08, respectively, higher than the median accuracy of the other methods. Further, we find that AncesTree achieves a median error on F and U that is 0.01 and 0.03 lower than the median error of the other methods. See Supplementary Appendix for further details on all five metrics.Fig. 3.


Reconstruction of clonal trees and tumor composition from multi-sample sequencing data.

El-Kebir M, Oesper L, Acheson-Field H, Raphael BJ - Bioinformatics (2015)

Violin plots comparing AncesTree, PhyloSub and CITUP on simulated data. (A) Accuracy of each method in predicting when mutations are ancestral to each other or (B) clustered in the same population. (C) Error in the inferred VAF fpj and (D) usage values upj. Median values are indicated below each algorithm
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4542783&req=5

btv261-F3: Violin plots comparing AncesTree, PhyloSub and CITUP on simulated data. (A) Accuracy of each method in predicting when mutations are ancestral to each other or (B) clustered in the same population. (C) Error in the inferred VAF fpj and (D) usage values upj. Median values are indicated below each algorithm
Mentions: We created 90 synthetic tumor datasets. Each dataset contains 100 mutations grouped into 10 clones that accumulated following the infinite sites assumption. For each dataset, we simulated between four and six samples sequenced at a coverage of 50X, 100X or 1000X. Further details of the simulated data are contained in the Supplementary Appendix. We ran AncesTree, PhyloSub and CITUP on each dataset and compared the results using five measures: (i) the fraction of ancestral relationships between pairs of mutations that were correctly identified (Fig. 3A); (ii) the fraction of clustered relationships between pairs of mutations that were correctly identified (Fig. 3B); (iii) the fraction of incomparable relationships (i.e. neither ancestral nor clustered) between pairs of mutations that were correctly identified (Supplementary Fig. A2); (iv) the average error between the simulated and inferred frequency matrix F (Fig. 3C) and (v) the error between the simulated usage matrix and the inferred usage U using the same metric as Malikic et al. (2015) (Fig. 3D). We note that we compute these measures only on the set of mutations that are included in the output of all methods, which equates to the set of mutations output by AncesTree (median of 69 of the 100 total mutations) since CITUP and PhyloSub include all mutations. We find that AncesTree has higher accuracy in determining ancestral, clustered and incomparable relationships with median accuracy more than 0.05, 0.03 and 0.08, respectively, higher than the median accuracy of the other methods. Further, we find that AncesTree achieves a median error on F and U that is 0.01 and 0.03 lower than the median error of the other methods. See Supplementary Appendix for further details on all five metrics.Fig. 3.

Bottom Line: We derive a combinatorial characterization of the solutions to this problem and show that the problem is NP-complete.We derive an integer linear programming solution to the VAF factorization problem in the case of error-free data and extend this solution to real data with a probabilistic model for errors.The resulting AncesTree algorithm is better able to identify ancestral relationships between individual mutations than existing approaches, particularly in ultra-deep sequencing data when high read counts for mutations yield high confidence VAFs.

View Article: PubMed Central - PubMed

Affiliation: Center for Computational Molecular Biology and Department of Computer Science, Brown University, Providence, RI 02912, USA.

Show MeSH
Related in: MedlinePlus