Limits...
High accuracy mutation detection in leukemia on a selected panel of cancer genes.

Kalender Atak Z, De Keersmaecker K, Gianfelici V, Geerdens E, Vandepoel R, Pauwels D, Porcu M, Lahortiga I, Brys V, Dirks WG, Quentmeier H, Cloos J, Cuppens H, Uyttebroeck A, Vandenberghe P, Cools J, Aerts S - PLoS ONE (2012)

Bottom Line: We tested nine combinations of different mapping and variant-calling methods, varied the variant calling parameters, and compared the predicted mutations with a large independent validation set obtained by capillary re-sequencing.We confirmed mutations in known T-ALL drivers, including PHF6, NF1, FBXW7, NOTCH1, KRAS, NRAS, PIK3CA, and PTEN.Finally, we re-sequenced a small set of 39 candidate genes and identified recurrent mutations in TET1, SPRY3 and SPRY4.

View Article: PubMed Central - PubMed

Affiliation: Center for Human Genetics, KU Leuven, Leuven, Belgium.

ABSTRACT
With the advent of whole-genome and whole-exome sequencing, high-quality catalogs of recurrently mutated cancer genes are becoming available for many cancer types. Increasing access to sequencing technology, including bench-top sequencers, provide the opportunity to re-sequence a limited set of cancer genes across a patient cohort with limited processing time. Here, we re-sequenced a set of cancer genes in T-cell acute lymphoblastic leukemia (T-ALL) using Nimblegen sequence capture coupled with Roche/454 technology. First, we investigated how a maximal sensitivity and specificity of mutation detection can be achieved through a benchmark study. We tested nine combinations of different mapping and variant-calling methods, varied the variant calling parameters, and compared the predicted mutations with a large independent validation set obtained by capillary re-sequencing. We found that the combination of two mapping algorithms, namely BWA-SW and SSAHA2, coupled with the variant calling algorithm Atlas-SNP2 yields the highest sensitivity (95%) and the highest specificity (93%). Next, we applied this analysis pipeline to identify mutations in a set of 58 cancer genes, in a panel of 18 T-ALL cell lines and 15 T-ALL patient samples. We confirmed mutations in known T-ALL drivers, including PHF6, NF1, FBXW7, NOTCH1, KRAS, NRAS, PIK3CA, and PTEN. Interestingly, we also found mutations in several cancer genes that had not been linked to T-ALL before, including JAK3. Finally, we re-sequenced a small set of 39 candidate genes and identified recurrent mutations in TET1, SPRY3 and SPRY4. In conclusion, we established an optimized analysis pipeline for Roche/454 data that can be applied to accurately detect gene mutations in cancer, which led to the identification of several new candidate T-ALL driver mutations.

Show MeSH

Related in: MedlinePlus

Performance comparison and parameter optimization.(A) Different pipelines show different sensitivity and specificity. Varying DoC and VAF thresholds in the variant calling process has an additional affect on the predictions in terms of sensitivity and specificity, respectively. Each pipeline is represented with a different symbol and the performance of each pipeline (in terms of sensitivity and specificity) is plotted under varying DoC and VAF thresholds. Note that the X-axis represents the false positive rate (1-specificity). In this ROC plot, the closer the point to the upper left point of the graph, the better the sensitivity and the specificity. Different colors of the symbols indicate the performance of the pipeline under changing VAF thresholds, and the two shaded boxes indicate the performance under changing DoC thresholds. The plot shows that (i) decreasing the DoC threshold increases the sensitivity of all pipelines as indicated with the blue dotted line; (ii) increasing the VAF threshold increases the specificity with a slight decrease in sensitivity as indicated (in the example of BLAT+VarScan pipeline) with the red dotted line; (iii) the BWA-SW+SSAHA2+Atlas-SNP2 pipeline has the best performance among all pipelines under DoC = 3 & VAF = 0.20 thresholds as indicated with the yellow arrow. The Roche pipeline is indicated with a black diamond shape since no parameter changes were performed on it, and SSAHA2+SAMTools and BWA-SW+SAMTools pipelines were colored grey since no VAF threshold changes were performed on them. (B) The Matthews correlation coefficient for each pipeline is shown for the most optimal performance of that pipeline (Table S1). It is interesting to note that the optimal performance of all the pipelines, except Roche gsMapper, was observed for a DoC threshold of 3.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3366948&req=5

pone-0038463-g001: Performance comparison and parameter optimization.(A) Different pipelines show different sensitivity and specificity. Varying DoC and VAF thresholds in the variant calling process has an additional affect on the predictions in terms of sensitivity and specificity, respectively. Each pipeline is represented with a different symbol and the performance of each pipeline (in terms of sensitivity and specificity) is plotted under varying DoC and VAF thresholds. Note that the X-axis represents the false positive rate (1-specificity). In this ROC plot, the closer the point to the upper left point of the graph, the better the sensitivity and the specificity. Different colors of the symbols indicate the performance of the pipeline under changing VAF thresholds, and the two shaded boxes indicate the performance under changing DoC thresholds. The plot shows that (i) decreasing the DoC threshold increases the sensitivity of all pipelines as indicated with the blue dotted line; (ii) increasing the VAF threshold increases the specificity with a slight decrease in sensitivity as indicated (in the example of BLAT+VarScan pipeline) with the red dotted line; (iii) the BWA-SW+SSAHA2+Atlas-SNP2 pipeline has the best performance among all pipelines under DoC = 3 & VAF = 0.20 thresholds as indicated with the yellow arrow. The Roche pipeline is indicated with a black diamond shape since no parameter changes were performed on it, and SSAHA2+SAMTools and BWA-SW+SAMTools pipelines were colored grey since no VAF threshold changes were performed on them. (B) The Matthews correlation coefficient for each pipeline is shown for the most optimal performance of that pipeline (Table S1). It is interesting to note that the optimal performance of all the pipelines, except Roche gsMapper, was observed for a DoC threshold of 3.

Mentions: Next, we further optimized the performance of each pipeline by varying the minimal required number of reads (depth of coverage, DoC) and the minimal required variant reads (variant allele frequency, VAF). Changes in DoC thresholds mainly affected the sensitivity, while varying VAF thresholds affected the predictions in terms of specificity (Figure 1.A, Table S2). All the pipelines reached their best performance with a DoC threshold of 3, and with a minimum VAF threshold of 0.20 (when applicable) (Table S1-S2). In a final effort to minimize false positive predictions, we combined the two best mapping algorithms in one pipeline, which further increased the sensitivity to 95% and the specificity to 93%. The reason for this increase in accuracy is that certain predicted variants that are caused by erroneous mapping (Figure S1) are now filtered out. Although this final pipeline (SSAHA2+ BWA-SW + Atlas-SNP2) performs better than gsMapper (91.2% sensitivity and 90.8% specificity), the difference is not large and gsMapper can be considered as a valid (and often easy to use) alternative (Figure 1.B).


High accuracy mutation detection in leukemia on a selected panel of cancer genes.

Kalender Atak Z, De Keersmaecker K, Gianfelici V, Geerdens E, Vandepoel R, Pauwels D, Porcu M, Lahortiga I, Brys V, Dirks WG, Quentmeier H, Cloos J, Cuppens H, Uyttebroeck A, Vandenberghe P, Cools J, Aerts S - PLoS ONE (2012)

Performance comparison and parameter optimization.(A) Different pipelines show different sensitivity and specificity. Varying DoC and VAF thresholds in the variant calling process has an additional affect on the predictions in terms of sensitivity and specificity, respectively. Each pipeline is represented with a different symbol and the performance of each pipeline (in terms of sensitivity and specificity) is plotted under varying DoC and VAF thresholds. Note that the X-axis represents the false positive rate (1-specificity). In this ROC plot, the closer the point to the upper left point of the graph, the better the sensitivity and the specificity. Different colors of the symbols indicate the performance of the pipeline under changing VAF thresholds, and the two shaded boxes indicate the performance under changing DoC thresholds. The plot shows that (i) decreasing the DoC threshold increases the sensitivity of all pipelines as indicated with the blue dotted line; (ii) increasing the VAF threshold increases the specificity with a slight decrease in sensitivity as indicated (in the example of BLAT+VarScan pipeline) with the red dotted line; (iii) the BWA-SW+SSAHA2+Atlas-SNP2 pipeline has the best performance among all pipelines under DoC = 3 & VAF = 0.20 thresholds as indicated with the yellow arrow. The Roche pipeline is indicated with a black diamond shape since no parameter changes were performed on it, and SSAHA2+SAMTools and BWA-SW+SAMTools pipelines were colored grey since no VAF threshold changes were performed on them. (B) The Matthews correlation coefficient for each pipeline is shown for the most optimal performance of that pipeline (Table S1). It is interesting to note that the optimal performance of all the pipelines, except Roche gsMapper, was observed for a DoC threshold of 3.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3366948&req=5

pone-0038463-g001: Performance comparison and parameter optimization.(A) Different pipelines show different sensitivity and specificity. Varying DoC and VAF thresholds in the variant calling process has an additional affect on the predictions in terms of sensitivity and specificity, respectively. Each pipeline is represented with a different symbol and the performance of each pipeline (in terms of sensitivity and specificity) is plotted under varying DoC and VAF thresholds. Note that the X-axis represents the false positive rate (1-specificity). In this ROC plot, the closer the point to the upper left point of the graph, the better the sensitivity and the specificity. Different colors of the symbols indicate the performance of the pipeline under changing VAF thresholds, and the two shaded boxes indicate the performance under changing DoC thresholds. The plot shows that (i) decreasing the DoC threshold increases the sensitivity of all pipelines as indicated with the blue dotted line; (ii) increasing the VAF threshold increases the specificity with a slight decrease in sensitivity as indicated (in the example of BLAT+VarScan pipeline) with the red dotted line; (iii) the BWA-SW+SSAHA2+Atlas-SNP2 pipeline has the best performance among all pipelines under DoC = 3 & VAF = 0.20 thresholds as indicated with the yellow arrow. The Roche pipeline is indicated with a black diamond shape since no parameter changes were performed on it, and SSAHA2+SAMTools and BWA-SW+SAMTools pipelines were colored grey since no VAF threshold changes were performed on them. (B) The Matthews correlation coefficient for each pipeline is shown for the most optimal performance of that pipeline (Table S1). It is interesting to note that the optimal performance of all the pipelines, except Roche gsMapper, was observed for a DoC threshold of 3.
Mentions: Next, we further optimized the performance of each pipeline by varying the minimal required number of reads (depth of coverage, DoC) and the minimal required variant reads (variant allele frequency, VAF). Changes in DoC thresholds mainly affected the sensitivity, while varying VAF thresholds affected the predictions in terms of specificity (Figure 1.A, Table S2). All the pipelines reached their best performance with a DoC threshold of 3, and with a minimum VAF threshold of 0.20 (when applicable) (Table S1-S2). In a final effort to minimize false positive predictions, we combined the two best mapping algorithms in one pipeline, which further increased the sensitivity to 95% and the specificity to 93%. The reason for this increase in accuracy is that certain predicted variants that are caused by erroneous mapping (Figure S1) are now filtered out. Although this final pipeline (SSAHA2+ BWA-SW + Atlas-SNP2) performs better than gsMapper (91.2% sensitivity and 90.8% specificity), the difference is not large and gsMapper can be considered as a valid (and often easy to use) alternative (Figure 1.B).

Bottom Line: We tested nine combinations of different mapping and variant-calling methods, varied the variant calling parameters, and compared the predicted mutations with a large independent validation set obtained by capillary re-sequencing.We confirmed mutations in known T-ALL drivers, including PHF6, NF1, FBXW7, NOTCH1, KRAS, NRAS, PIK3CA, and PTEN.Finally, we re-sequenced a small set of 39 candidate genes and identified recurrent mutations in TET1, SPRY3 and SPRY4.

View Article: PubMed Central - PubMed

Affiliation: Center for Human Genetics, KU Leuven, Leuven, Belgium.

ABSTRACT
With the advent of whole-genome and whole-exome sequencing, high-quality catalogs of recurrently mutated cancer genes are becoming available for many cancer types. Increasing access to sequencing technology, including bench-top sequencers, provide the opportunity to re-sequence a limited set of cancer genes across a patient cohort with limited processing time. Here, we re-sequenced a set of cancer genes in T-cell acute lymphoblastic leukemia (T-ALL) using Nimblegen sequence capture coupled with Roche/454 technology. First, we investigated how a maximal sensitivity and specificity of mutation detection can be achieved through a benchmark study. We tested nine combinations of different mapping and variant-calling methods, varied the variant calling parameters, and compared the predicted mutations with a large independent validation set obtained by capillary re-sequencing. We found that the combination of two mapping algorithms, namely BWA-SW and SSAHA2, coupled with the variant calling algorithm Atlas-SNP2 yields the highest sensitivity (95%) and the highest specificity (93%). Next, we applied this analysis pipeline to identify mutations in a set of 58 cancer genes, in a panel of 18 T-ALL cell lines and 15 T-ALL patient samples. We confirmed mutations in known T-ALL drivers, including PHF6, NF1, FBXW7, NOTCH1, KRAS, NRAS, PIK3CA, and PTEN. Interestingly, we also found mutations in several cancer genes that had not been linked to T-ALL before, including JAK3. Finally, we re-sequenced a small set of 39 candidate genes and identified recurrent mutations in TET1, SPRY3 and SPRY4. In conclusion, we established an optimized analysis pipeline for Roche/454 data that can be applied to accurately detect gene mutations in cancer, which led to the identification of several new candidate T-ALL driver mutations.

Show MeSH
Related in: MedlinePlus