Limits...
Statistical aspects of discerning indel-type structural variation via DNA sequence alignment.

Wendl MC, Wilson RK - BMC Genomics (2009)

Bottom Line: Conversely, no significant improvement would be realized in lowering fosmid variances beyond current levels.Detection power is examined under a straightforward alternative hypothesis and found to be generally acceptable.At 1% risk, many designs will leave a significant gap in the 100 to 200 bp neighborhood, requiring unacceptably high redundancies to compensate.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Genome Center and Department of Genetics, Washington University, St Louis, MO 63108, USA. mwendl@wustl.edu

ABSTRACT

Background: Structural variations in the form of DNA insertions and deletions are an important aspect of human genetics and especially relevant to medical disorders. Investigations have shown that such events can be detected via tell-tale discrepancies in the aligned lengths of paired-end DNA sequencing reads. Quantitative aspects underlying this method remain poorly understood, despite its importance and conceptual simplicity. We report the statistical theory characterizing the length-discrepancy scheme for Gaussian libraries, including coverage-related effects that preceding models are unable to account for.

Results: Deletion and insertion statistics both depend heavily on physical coverage, but otherwise differ dramatically, refuting a commonly held doctrine of symmetry. Specifically, coverage restrictions render insertions much more difficult to capture. Increased read length has the counterintuitive effect of worsening insertion detection characteristics of short inserts. Variance in library insert length is also a critical factor here and should be minimized to the greatest degree possible. Conversely, no significant improvement would be realized in lowering fosmid variances beyond current levels. Detection power is examined under a straightforward alternative hypothesis and found to be generally acceptable. We also consider the proposition of characterizing variation over the entire spectrum of variant sizes under constant risk of false-positive errors. At 1% risk, many designs will leave a significant gap in the 100 to 200 bp neighborhood, requiring unacceptably high redundancies to compensate. We show that a few modifications largely close this gap and we give a few examples of feasible spectrum-covering designs.

Conclusion: The theory resolves several outstanding issues and furnishes a general methodology for designing future projects from the standpoint of a spectrum-wide constant risk.

Show MeSH
Curves of α vs σ for heterozygous ISV using 40 kb fosmids on "small" (m/λ = 0.2) and "large" (m/λ = 0.8) insertions. Vertical reference line shows the 7% COV threshold, characteristic of the library in ref. [6].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2748092&req=5

Figure 4: Curves of α vs σ for heterozygous ISV using 40 kb fosmids on "small" (m/λ = 0.2) and "large" (m/λ = 0.8) insertions. Vertical reference line shows the 7% COV threshold, characteristic of the library in ref. [6].

Mentions: Library variance is conventionally thought of as something that should be minimized to the greatest extent possible in order to improve SV detection [9]. This view actually comes with some caveats, as illustrated by Fig. 4 for heterozygous ISV detection using 40 kb fosmids. Fosmid libraries can routinely achieve COV of around 7% because of packaging constraints inherent to the vector [6]. Yet, α is largely constant for COV ≤ 7% over a wide range of redundancies, implying that special efforts aimed at further reducing fosmid library variance would be unwarranted. While some sensitivity is actually realized for very small ISV, i.e. less than 10% of insert size, the limit on precision mentioned above renders these instances irrelevant.


Statistical aspects of discerning indel-type structural variation via DNA sequence alignment.

Wendl MC, Wilson RK - BMC Genomics (2009)

Curves of α vs σ for heterozygous ISV using 40 kb fosmids on "small" (m/λ = 0.2) and "large" (m/λ = 0.8) insertions. Vertical reference line shows the 7% COV threshold, characteristic of the library in ref. [6].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2748092&req=5

Figure 4: Curves of α vs σ for heterozygous ISV using 40 kb fosmids on "small" (m/λ = 0.2) and "large" (m/λ = 0.8) insertions. Vertical reference line shows the 7% COV threshold, characteristic of the library in ref. [6].
Mentions: Library variance is conventionally thought of as something that should be minimized to the greatest extent possible in order to improve SV detection [9]. This view actually comes with some caveats, as illustrated by Fig. 4 for heterozygous ISV detection using 40 kb fosmids. Fosmid libraries can routinely achieve COV of around 7% because of packaging constraints inherent to the vector [6]. Yet, α is largely constant for COV ≤ 7% over a wide range of redundancies, implying that special efforts aimed at further reducing fosmid library variance would be unwarranted. While some sensitivity is actually realized for very small ISV, i.e. less than 10% of insert size, the limit on precision mentioned above renders these instances irrelevant.

Bottom Line: Conversely, no significant improvement would be realized in lowering fosmid variances beyond current levels.Detection power is examined under a straightforward alternative hypothesis and found to be generally acceptable.At 1% risk, many designs will leave a significant gap in the 100 to 200 bp neighborhood, requiring unacceptably high redundancies to compensate.

View Article: PubMed Central - HTML - PubMed

Affiliation: The Genome Center and Department of Genetics, Washington University, St Louis, MO 63108, USA. mwendl@wustl.edu

ABSTRACT

Background: Structural variations in the form of DNA insertions and deletions are an important aspect of human genetics and especially relevant to medical disorders. Investigations have shown that such events can be detected via tell-tale discrepancies in the aligned lengths of paired-end DNA sequencing reads. Quantitative aspects underlying this method remain poorly understood, despite its importance and conceptual simplicity. We report the statistical theory characterizing the length-discrepancy scheme for Gaussian libraries, including coverage-related effects that preceding models are unable to account for.

Results: Deletion and insertion statistics both depend heavily on physical coverage, but otherwise differ dramatically, refuting a commonly held doctrine of symmetry. Specifically, coverage restrictions render insertions much more difficult to capture. Increased read length has the counterintuitive effect of worsening insertion detection characteristics of short inserts. Variance in library insert length is also a critical factor here and should be minimized to the greatest degree possible. Conversely, no significant improvement would be realized in lowering fosmid variances beyond current levels. Detection power is examined under a straightforward alternative hypothesis and found to be generally acceptable. We also consider the proposition of characterizing variation over the entire spectrum of variant sizes under constant risk of false-positive errors. At 1% risk, many designs will leave a significant gap in the 100 to 200 bp neighborhood, requiring unacceptably high redundancies to compensate. We show that a few modifications largely close this gap and we give a few examples of feasible spectrum-covering designs.

Conclusion: The theory resolves several outstanding issues and furnishes a general methodology for designing future projects from the standpoint of a spectrum-wide constant risk.

Show MeSH