Limits...
Error correction and diversity analysis of population mixtures determined by NGS.

Wood GR, Burroughs NJ, Evans DJ, Ryabov EV - PeerJ (2014)

Bottom Line: The impetus for this work was the need to analyse nucleotide diversity in a viral mix taken from honeybees.The paper has two findings.A compendium of existing and new diversity analysis tools is also presented, allowing hypotheses about diversity and mean diversity to be tested and associated confidence intervals to be calculated.

View Article: PubMed Central - HTML - PubMed

Affiliation: Warwick Systems Biology Centre, University of Warwick , Coventry , United Kingdom.

ABSTRACT
The impetus for this work was the need to analyse nucleotide diversity in a viral mix taken from honeybees. The paper has two findings. First, a method for correction of next generation sequencing error in the distribution of nucleotides at a site is developed. Second, a package of methods for assessment of nucleotide diversity is assembled. The error correction method is statistically based and works at the level of the nucleotide distribution rather than the level of individual nucleotides. The method relies on an error model and a sample of known viral genotypes that is used for model calibration. A compendium of existing and new diversity analysis tools is also presented, allowing hypotheses about diversity and mean diversity to be tested and associated confidence intervals to be calculated. The methods are illustrated using honeybee viral samples. Software in both Excel and Matlab and a guide are available at http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/, the Warwick University Systems Biology Centre software download site.

No MeSH data available.


Related in: MedlinePlus

The estimated residual error rate  after correction plotted against the initial error rate α, when the true initial distribution is the degenerate p = (1, 0, 0, 0).The larger the coverage n, the more accurately  estimates q and so the more dependable the correction, hence the smaller is β. Curves are shown for coverage n of 1,000 (blue), 2,000 (green), 5,000 (red), 10,000 (cyan) and 20,000 (magenta). Each point on the graph is the mean of N = 5,000 replicate trials.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4232844&req=5

fig-2: The estimated residual error rate after correction plotted against the initial error rate α, when the true initial distribution is the degenerate p = (1, 0, 0, 0).The larger the coverage n, the more accurately estimates q and so the more dependable the correction, hence the smaller is β. Curves are shown for coverage n of 1,000 (blue), 2,000 (green), 5,000 (red), 10,000 (cyan) and 20,000 (magenta). Each point on the graph is the mean of N = 5,000 replicate trials.

Mentions: It is possible to estimate an upper bound for the residual error rate remaining after correction, given an NGS error rate α and coverage n. This is done by using the calibration step with p standard = (1, 0, 0, 0) and , denoting the estimated residual error rate by . Here is found by first generating n values using q = M(α) p, then correcting using rate α. A graph of against α is given in Fig. 2, for the case p standard = (1, 0, 0, 0) and n = 1,000, 2,000, 5,000, 10,000 and 20,000. This demonstrates, for these parameters, that correction reduces the error by a factor of over 10 for a coverage of n = 1,000. That this is an upper bound is made clear in the later Discussion section.


Error correction and diversity analysis of population mixtures determined by NGS.

Wood GR, Burroughs NJ, Evans DJ, Ryabov EV - PeerJ (2014)

The estimated residual error rate  after correction plotted against the initial error rate α, when the true initial distribution is the degenerate p = (1, 0, 0, 0).The larger the coverage n, the more accurately  estimates q and so the more dependable the correction, hence the smaller is β. Curves are shown for coverage n of 1,000 (blue), 2,000 (green), 5,000 (red), 10,000 (cyan) and 20,000 (magenta). Each point on the graph is the mean of N = 5,000 replicate trials.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4232844&req=5

fig-2: The estimated residual error rate after correction plotted against the initial error rate α, when the true initial distribution is the degenerate p = (1, 0, 0, 0).The larger the coverage n, the more accurately estimates q and so the more dependable the correction, hence the smaller is β. Curves are shown for coverage n of 1,000 (blue), 2,000 (green), 5,000 (red), 10,000 (cyan) and 20,000 (magenta). Each point on the graph is the mean of N = 5,000 replicate trials.
Mentions: It is possible to estimate an upper bound for the residual error rate remaining after correction, given an NGS error rate α and coverage n. This is done by using the calibration step with p standard = (1, 0, 0, 0) and , denoting the estimated residual error rate by . Here is found by first generating n values using q = M(α) p, then correcting using rate α. A graph of against α is given in Fig. 2, for the case p standard = (1, 0, 0, 0) and n = 1,000, 2,000, 5,000, 10,000 and 20,000. This demonstrates, for these parameters, that correction reduces the error by a factor of over 10 for a coverage of n = 1,000. That this is an upper bound is made clear in the later Discussion section.

Bottom Line: The impetus for this work was the need to analyse nucleotide diversity in a viral mix taken from honeybees.The paper has two findings.A compendium of existing and new diversity analysis tools is also presented, allowing hypotheses about diversity and mean diversity to be tested and associated confidence intervals to be calculated.

View Article: PubMed Central - HTML - PubMed

Affiliation: Warwick Systems Biology Centre, University of Warwick , Coventry , United Kingdom.

ABSTRACT
The impetus for this work was the need to analyse nucleotide diversity in a viral mix taken from honeybees. The paper has two findings. First, a method for correction of next generation sequencing error in the distribution of nucleotides at a site is developed. Second, a package of methods for assessment of nucleotide diversity is assembled. The error correction method is statistically based and works at the level of the nucleotide distribution rather than the level of individual nucleotides. The method relies on an error model and a sample of known viral genotypes that is used for model calibration. A compendium of existing and new diversity analysis tools is also presented, allowing hypotheses about diversity and mean diversity to be tested and associated confidence intervals to be calculated. The methods are illustrated using honeybee viral samples. Software in both Excel and Matlab and a guide are available at http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/, the Warwick University Systems Biology Centre software download site.

No MeSH data available.


Related in: MedlinePlus