Limits...
A comparative analysis of algorithms for somatic SNV detection in cancer.

Roberts ND, Kortschak RD, Parker WT, Schreiber AW, Branford S, Scott HS, Glonek G, Adelson DL - Bioinformatics (2013)

Bottom Line: Four recently published algorithms for the detection of somatic SNV sites in matched cancer-normal sequencing datasets are VarScan, SomaticSniper, JointSNVMix and Strelka.In this analysis, we apply these four SNV calling algorithms to cancer-normal Illumina exome sequencing of a chronic myeloid leukaemia (CML) patient.Comparing the candidate SNV sets returned by VarScan, SomaticSniper, JointSNVMix2 and Strelka revealed substantial differences with respect to the number and character of sites returned; the somatic probability scores assigned to the same sites; their susceptibility to various sources of noise; and their sensitivities to low-allelic-fraction candidates.

View Article: PubMed Central - PubMed

Affiliation: School of Molecular and Biomedical Science and School of Mathematical Sciences, University of Adelaide, South Australia, Australia.

ABSTRACT

Motivation: With the advent of relatively affordable high-throughput technologies, DNA sequencing of cancers is now common practice in cancer research projects and will be increasingly used in clinical practice to inform diagnosis and treatment. Somatic (cancer-only) single nucleotide variants (SNVs) are the simplest class of mutation, yet their identification in DNA sequencing data is confounded by germline polymorphisms, tumour heterogeneity and sequencing and analysis errors. Four recently published algorithms for the detection of somatic SNV sites in matched cancer-normal sequencing datasets are VarScan, SomaticSniper, JointSNVMix and Strelka. In this analysis, we apply these four SNV calling algorithms to cancer-normal Illumina exome sequencing of a chronic myeloid leukaemia (CML) patient. The candidate SNV sites returned by each algorithm are filtered to remove likely false positives, then characterized and compared to investigate the strengths and weaknesses of each SNV calling algorithm.

Results: Comparing the candidate SNV sets returned by VarScan, SomaticSniper, JointSNVMix2 and Strelka revealed substantial differences with respect to the number and character of sites returned; the somatic probability scores assigned to the same sites; their susceptibility to various sources of noise; and their sensitivities to low-allelic-fraction candidates.

Availability: Data accession number SRA081939, code at http://code.google.com/p/snv-caller-review/

Contact: david.adelson@adelaide.edu.au

Supplementary information: Supplementary data are available at Bioinformatics online.

Show MeSH

Related in: MedlinePlus

Probability scores of somatic candidates in common between pairs of algorithms for the CML exome. Pearson correlation coefficients between pairs are VS&SS 0.50, VS&JS 0.59, VS&ST 0.42, SS&JS 0.23, SS&ST 0.21 and JS&ST 0.46
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3753564&req=5

btt375-F2: Probability scores of somatic candidates in common between pairs of algorithms for the CML exome. Pearson correlation coefficients between pairs are VS&SS 0.50, VS&JS 0.59, VS&ST 0.42, SS&JS 0.23, SS&ST 0.21 and JS&ST 0.46

Mentions: The distribution of somatic probability scores for sites unique to each caller and returned by multiple callers, as shown in Figure 1, indicates that filtering out sites returned by only one algorithm would remove sites regarded to have high somatic probability by one such measure. Furthermore, Figure 2 shows that many sites returned with a high probability score by one caller are returned with a much lower probability score by another. If the first step of post-calling filtration was to remove from each caller’s output the sites returned with low probability, then information on sites being found by multiple callers but at markedly different probabilities would be lost. Given the poor correlations between probability scores from different callers for the same sites, their intrinsic value is questionable.Fig. 1.


A comparative analysis of algorithms for somatic SNV detection in cancer.

Roberts ND, Kortschak RD, Parker WT, Schreiber AW, Branford S, Scott HS, Glonek G, Adelson DL - Bioinformatics (2013)

Probability scores of somatic candidates in common between pairs of algorithms for the CML exome. Pearson correlation coefficients between pairs are VS&SS 0.50, VS&JS 0.59, VS&ST 0.42, SS&JS 0.23, SS&ST 0.21 and JS&ST 0.46
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3753564&req=5

btt375-F2: Probability scores of somatic candidates in common between pairs of algorithms for the CML exome. Pearson correlation coefficients between pairs are VS&SS 0.50, VS&JS 0.59, VS&ST 0.42, SS&JS 0.23, SS&ST 0.21 and JS&ST 0.46
Mentions: The distribution of somatic probability scores for sites unique to each caller and returned by multiple callers, as shown in Figure 1, indicates that filtering out sites returned by only one algorithm would remove sites regarded to have high somatic probability by one such measure. Furthermore, Figure 2 shows that many sites returned with a high probability score by one caller are returned with a much lower probability score by another. If the first step of post-calling filtration was to remove from each caller’s output the sites returned with low probability, then information on sites being found by multiple callers but at markedly different probabilities would be lost. Given the poor correlations between probability scores from different callers for the same sites, their intrinsic value is questionable.Fig. 1.

Bottom Line: Four recently published algorithms for the detection of somatic SNV sites in matched cancer-normal sequencing datasets are VarScan, SomaticSniper, JointSNVMix and Strelka.In this analysis, we apply these four SNV calling algorithms to cancer-normal Illumina exome sequencing of a chronic myeloid leukaemia (CML) patient.Comparing the candidate SNV sets returned by VarScan, SomaticSniper, JointSNVMix2 and Strelka revealed substantial differences with respect to the number and character of sites returned; the somatic probability scores assigned to the same sites; their susceptibility to various sources of noise; and their sensitivities to low-allelic-fraction candidates.

View Article: PubMed Central - PubMed

Affiliation: School of Molecular and Biomedical Science and School of Mathematical Sciences, University of Adelaide, South Australia, Australia.

ABSTRACT

Motivation: With the advent of relatively affordable high-throughput technologies, DNA sequencing of cancers is now common practice in cancer research projects and will be increasingly used in clinical practice to inform diagnosis and treatment. Somatic (cancer-only) single nucleotide variants (SNVs) are the simplest class of mutation, yet their identification in DNA sequencing data is confounded by germline polymorphisms, tumour heterogeneity and sequencing and analysis errors. Four recently published algorithms for the detection of somatic SNV sites in matched cancer-normal sequencing datasets are VarScan, SomaticSniper, JointSNVMix and Strelka. In this analysis, we apply these four SNV calling algorithms to cancer-normal Illumina exome sequencing of a chronic myeloid leukaemia (CML) patient. The candidate SNV sites returned by each algorithm are filtered to remove likely false positives, then characterized and compared to investigate the strengths and weaknesses of each SNV calling algorithm.

Results: Comparing the candidate SNV sets returned by VarScan, SomaticSniper, JointSNVMix2 and Strelka revealed substantial differences with respect to the number and character of sites returned; the somatic probability scores assigned to the same sites; their susceptibility to various sources of noise; and their sensitivities to low-allelic-fraction candidates.

Availability: Data accession number SRA081939, code at http://code.google.com/p/snv-caller-review/

Contact: david.adelson@adelaide.edu.au

Supplementary information: Supplementary data are available at Bioinformatics online.

Show MeSH
Related in: MedlinePlus