Limits...
CNV-seq, a new method to detect copy number variation using high-throughput sequencing.

Xie C, Tammi MT - BMC Bioinformatics (2009)

Bottom Line: Our results show that the number of reads, not the length of the reads is the key factor determining the resolution of detection.Simulation of various sequencing methods with coverage between 0.1x to 8x show overall specificity between 91.7 - 99.9%, and sensitivity between 72.2 - 96.5%.We also show the results for assessment of CNV between two individual human genomes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Sciences, National University of Singapore, Singapore. xie@nus.edu.sg

ABSTRACT

Background: DNA copy number variation (CNV) has been recognized as an important source of genetic variation. Array comparative genomic hybridization (aCGH) is commonly used for CNV detection, but the microarray platform has a number of inherent limitations.

Results: Here, we describe a method to detect copy number variation using shotgun sequencing, CNV-seq. The method is based on a robust statistical model that describes the complete analysis procedure and allows the computation of essential confidence values for detection of CNV. Our results show that the number of reads, not the length of the reads is the key factor determining the resolution of detection. This favors the next-generation sequencing methods that rapidly produce large amount of short reads.

Conclusion: Simulation of various sequencing methods with coverage between 0.1x to 8x show overall specificity between 91.7 - 99.9%, and sensitivity between 72.2 - 96.5%. We also show the results for assessment of CNV between two individual human genomes.

Show MeSH
Performance of CNV-seq. The performance of CNV-seq on data simulating 454, Sanger and Solexa methods. Results are shown for 0.1×–8× coverages (right) and p-value range of 10-5-10-2 (top). Each dot represents an average of 100 simulations and the size of the dots represents the window (log10) size, i.e. resolution used. The window sizes are calculated using equation (5).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2667514&req=5

Figure 3: Performance of CNV-seq. The performance of CNV-seq on data simulating 454, Sanger and Solexa methods. Results are shown for 0.1×–8× coverages (right) and p-value range of 10-5-10-2 (top). Each dot represents an average of 100 simulations and the size of the dots represents the window (log10) size, i.e. resolution used. The window sizes are calculated using equation (5).

Mentions: The Figure 3 shows the results of the simulations on varied coverage and varied p' for constant log2(r') = 0.6. Each dot represents an average of 100 simulations and the sizes of the dots reflect the sizes of the lengths of the sliding windows that are the theoretical minimum lengths, given by equation (5). The overall specificity for our method is between 91.7 – 99.9%, the sensitivity between 72.2 – 96.5% with the median of 99.4% and 89.9% respectively. The mean sequence length is dependent on the technology simulated. Thus, in order to reach the same coverage, a larger number of fragments need to be sequenced when sequencing is performed with Solexa, which produces short reads compared to the Sanger and 454 methods. According to our model, the largest number of sequenced reads yields the shortest length of the sliding window and thus the best resolution. The range of window sizes in our simulations varies from 1,103 bases to 2,951,792 bases, decreasing with increasing average sequencing coverage. The results show that our model performs well in the presence of errors. Despite of increased resolution due to shortening of the sliding window size, the sensitivity is increased together with increased sequencing coverage. Slight drop in specificity with increasing sequencing coverage can be observed (Figure 3). This is likely to be due to SNPs, short indels, and read mapping errors, that are not considered in our statistical model and have a more profound effect on small windows. The specificity does not drop in error free data. The effect of errors may be reduced by using a window size that is larger than the theoretical minimum. For example, the theoretical minimum window for 8× Solexa sequencing at p = 0.001 is 1947 bases. This window size gives a specificity of 95.4%, while a 2 times larger window yields specificity of 97.8% (Figure 4).


CNV-seq, a new method to detect copy number variation using high-throughput sequencing.

Xie C, Tammi MT - BMC Bioinformatics (2009)

Performance of CNV-seq. The performance of CNV-seq on data simulating 454, Sanger and Solexa methods. Results are shown for 0.1×–8× coverages (right) and p-value range of 10-5-10-2 (top). Each dot represents an average of 100 simulations and the size of the dots represents the window (log10) size, i.e. resolution used. The window sizes are calculated using equation (5).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2667514&req=5

Figure 3: Performance of CNV-seq. The performance of CNV-seq on data simulating 454, Sanger and Solexa methods. Results are shown for 0.1×–8× coverages (right) and p-value range of 10-5-10-2 (top). Each dot represents an average of 100 simulations and the size of the dots represents the window (log10) size, i.e. resolution used. The window sizes are calculated using equation (5).
Mentions: The Figure 3 shows the results of the simulations on varied coverage and varied p' for constant log2(r') = 0.6. Each dot represents an average of 100 simulations and the sizes of the dots reflect the sizes of the lengths of the sliding windows that are the theoretical minimum lengths, given by equation (5). The overall specificity for our method is between 91.7 – 99.9%, the sensitivity between 72.2 – 96.5% with the median of 99.4% and 89.9% respectively. The mean sequence length is dependent on the technology simulated. Thus, in order to reach the same coverage, a larger number of fragments need to be sequenced when sequencing is performed with Solexa, which produces short reads compared to the Sanger and 454 methods. According to our model, the largest number of sequenced reads yields the shortest length of the sliding window and thus the best resolution. The range of window sizes in our simulations varies from 1,103 bases to 2,951,792 bases, decreasing with increasing average sequencing coverage. The results show that our model performs well in the presence of errors. Despite of increased resolution due to shortening of the sliding window size, the sensitivity is increased together with increased sequencing coverage. Slight drop in specificity with increasing sequencing coverage can be observed (Figure 3). This is likely to be due to SNPs, short indels, and read mapping errors, that are not considered in our statistical model and have a more profound effect on small windows. The specificity does not drop in error free data. The effect of errors may be reduced by using a window size that is larger than the theoretical minimum. For example, the theoretical minimum window for 8× Solexa sequencing at p = 0.001 is 1947 bases. This window size gives a specificity of 95.4%, while a 2 times larger window yields specificity of 97.8% (Figure 4).

Bottom Line: Our results show that the number of reads, not the length of the reads is the key factor determining the resolution of detection.Simulation of various sequencing methods with coverage between 0.1x to 8x show overall specificity between 91.7 - 99.9%, and sensitivity between 72.2 - 96.5%.We also show the results for assessment of CNV between two individual human genomes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Sciences, National University of Singapore, Singapore. xie@nus.edu.sg

ABSTRACT

Background: DNA copy number variation (CNV) has been recognized as an important source of genetic variation. Array comparative genomic hybridization (aCGH) is commonly used for CNV detection, but the microarray platform has a number of inherent limitations.

Results: Here, we describe a method to detect copy number variation using shotgun sequencing, CNV-seq. The method is based on a robust statistical model that describes the complete analysis procedure and allows the computation of essential confidence values for detection of CNV. Our results show that the number of reads, not the length of the reads is the key factor determining the resolution of detection. This favors the next-generation sequencing methods that rapidly produce large amount of short reads.

Conclusion: Simulation of various sequencing methods with coverage between 0.1x to 8x show overall specificity between 91.7 - 99.9%, and sensitivity between 72.2 - 96.5%. We also show the results for assessment of CNV between two individual human genomes.

Show MeSH