Limits...
Quantitative analysis of differences in copy numbers using read depth obtained from PCR-enriched samples and controls.

Reinecke F, Satya RV, DiCarlo J - BMC Bioinformatics (2015)

Bottom Line: PCR-enriched amplicon-sequencing data have special characteristics that have been taken into account by only one publicly available algorithm so far.We describe a new algorithm named quandico to detect copy number differences based on NGS data generated following PCR-enrichment.A weighted t-test statistic was applied to calculate probabilities (p-values) of copy number changes.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Assay Design & Analysis, QIAGEN GmbH, Max-Volmer-Straße 4, Hilden, 40724, Germany. frank.reinecke@qiagen.com.

ABSTRACT

Background: Next-generation sequencing (NGS) is rapidly becoming common practice in clinical diagnostics and cancer research. In addition to the detection of single nucleotide variants (SNVs), information on copy number variants (CNVs) is of great interest. Several algorithms exist to detect CNVs by analyzing whole genome sequencing data or data from samples enriched by hybridization-capture. PCR-enriched amplicon-sequencing data have special characteristics that have been taken into account by only one publicly available algorithm so far.

Results: We describe a new algorithm named quandico to detect copy number differences based on NGS data generated following PCR-enrichment. A weighted t-test statistic was applied to calculate probabilities (p-values) of copy number changes. We assessed the performance of the method using sequencing reads generated from reference DNA with known CNVs, and we were able to detect these variants with 98.6% sensitivity and 98.5% specificity which is significantly better than another recently described method for amplicon sequencing. The source code (R-package) of quandico is licensed under the GPLv3 and it is available at https://github.com/reineckef/quandico .

Conclusion: We demonstrated that our new algorithm is suitable to call copy number changes using data from PCR-enriched samples with high sensitivity and specificity even for single copy differences.

Show MeSH

Related in: MedlinePlus

Dispersion correction. Illustration of the dispersion correction effect by ϕ. First row: before correction, second row: after correction. Calls for the sequencing data sets (M62 and M63) and both control samples (NA12898, NA19129) are plotted separately (in columns). A CNV is called if the determined Q score is higher than the threshold (normalized to 1 in this diagram). False classifications (FN: false negative, FP: false positive) are shown in red. Loci with known CNVs in the sample are shown as dots and loci with normal copy number are plotted as crosses.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4384318&req=5

Fig1: Dispersion correction. Illustration of the dispersion correction effect by ϕ. First row: before correction, second row: after correction. Calls for the sequencing data sets (M62 and M63) and both control samples (NA12898, NA19129) are plotted separately (in columns). A CNV is called if the determined Q score is higher than the threshold (normalized to 1 in this diagram). False classifications (FN: false negative, FP: false positive) are shown in red. Loci with known CNVs in the sample are shown as dots and loci with normal copy number are plotted as crosses.

Mentions: The factor in equation 8 was introduced to correct false classifications. Without correction, clusters with expected negative result and with low dispersion showed over-estimated p-values (Figure 1, top row: × colored in red, false positives). On the other hand, p-values for clusters with known CNVs were under-estimated if the dispersion was high (∙, also red, false negatives). The p-values alone did not allow optimal discrimination. After correction with the factor ϕ, the linear discrimination yielded both fewer false positives and fewer false negatives (bottom row in Figure 1).Figure 1


Quantitative analysis of differences in copy numbers using read depth obtained from PCR-enriched samples and controls.

Reinecke F, Satya RV, DiCarlo J - BMC Bioinformatics (2015)

Dispersion correction. Illustration of the dispersion correction effect by ϕ. First row: before correction, second row: after correction. Calls for the sequencing data sets (M62 and M63) and both control samples (NA12898, NA19129) are plotted separately (in columns). A CNV is called if the determined Q score is higher than the threshold (normalized to 1 in this diagram). False classifications (FN: false negative, FP: false positive) are shown in red. Loci with known CNVs in the sample are shown as dots and loci with normal copy number are plotted as crosses.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4384318&req=5

Fig1: Dispersion correction. Illustration of the dispersion correction effect by ϕ. First row: before correction, second row: after correction. Calls for the sequencing data sets (M62 and M63) and both control samples (NA12898, NA19129) are plotted separately (in columns). A CNV is called if the determined Q score is higher than the threshold (normalized to 1 in this diagram). False classifications (FN: false negative, FP: false positive) are shown in red. Loci with known CNVs in the sample are shown as dots and loci with normal copy number are plotted as crosses.
Mentions: The factor in equation 8 was introduced to correct false classifications. Without correction, clusters with expected negative result and with low dispersion showed over-estimated p-values (Figure 1, top row: × colored in red, false positives). On the other hand, p-values for clusters with known CNVs were under-estimated if the dispersion was high (∙, also red, false negatives). The p-values alone did not allow optimal discrimination. After correction with the factor ϕ, the linear discrimination yielded both fewer false positives and fewer false negatives (bottom row in Figure 1).Figure 1

Bottom Line: PCR-enriched amplicon-sequencing data have special characteristics that have been taken into account by only one publicly available algorithm so far.We describe a new algorithm named quandico to detect copy number differences based on NGS data generated following PCR-enrichment.A weighted t-test statistic was applied to calculate probabilities (p-values) of copy number changes.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Assay Design & Analysis, QIAGEN GmbH, Max-Volmer-Straße 4, Hilden, 40724, Germany. frank.reinecke@qiagen.com.

ABSTRACT

Background: Next-generation sequencing (NGS) is rapidly becoming common practice in clinical diagnostics and cancer research. In addition to the detection of single nucleotide variants (SNVs), information on copy number variants (CNVs) is of great interest. Several algorithms exist to detect CNVs by analyzing whole genome sequencing data or data from samples enriched by hybridization-capture. PCR-enriched amplicon-sequencing data have special characteristics that have been taken into account by only one publicly available algorithm so far.

Results: We describe a new algorithm named quandico to detect copy number differences based on NGS data generated following PCR-enrichment. A weighted t-test statistic was applied to calculate probabilities (p-values) of copy number changes. We assessed the performance of the method using sequencing reads generated from reference DNA with known CNVs, and we were able to detect these variants with 98.6% sensitivity and 98.5% specificity which is significantly better than another recently described method for amplicon sequencing. The source code (R-package) of quandico is licensed under the GPLv3 and it is available at https://github.com/reineckef/quandico .

Conclusion: We demonstrated that our new algorithm is suitable to call copy number changes using data from PCR-enriched samples with high sensitivity and specificity even for single copy differences.

Show MeSH
Related in: MedlinePlus