Limits...
Error correction and diversity analysis of population mixtures determined by NGS.

Wood GR, Burroughs NJ, Evans DJ, Ryabov EV - PeerJ (2014)

Bottom Line: The impetus for this work was the need to analyse nucleotide diversity in a viral mix taken from honeybees.The paper has two findings.A compendium of existing and new diversity analysis tools is also presented, allowing hypotheses about diversity and mean diversity to be tested and associated confidence intervals to be calculated.

View Article: PubMed Central - HTML - PubMed

Affiliation: Warwick Systems Biology Centre, University of Warwick , Coventry , United Kingdom.

ABSTRACT
The impetus for this work was the need to analyse nucleotide diversity in a viral mix taken from honeybees. The paper has two findings. First, a method for correction of next generation sequencing error in the distribution of nucleotides at a site is developed. Second, a package of methods for assessment of nucleotide diversity is assembled. The error correction method is statistically based and works at the level of the nucleotide distribution rather than the level of individual nucleotides. The method relies on an error model and a sample of known viral genotypes that is used for model calibration. A compendium of existing and new diversity analysis tools is also presented, allowing hypotheses about diversity and mean diversity to be tested and associated confidence intervals to be calculated. The methods are illustrated using honeybee viral samples. Software in both Excel and Matlab and a guide are available at http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/, the Warwick University Systems Biology Centre software download site.

No MeSH data available.


Related in: MedlinePlus

The stages of NGS nucleotide distribution error correction.(A) Error is introduced to the true nucleotide distribution p, giving distribution , the result of multinomial sampling with n (the coverage, the number of nucleotides sequenced) trials in which the true sequenced nucleotide distribution is given by q = Mp. The error is corrected by forming , the normalisation of . In practice M must be estimated (the calibration step), using a known initial distribution p. (B) The initial diversity H(p) increases to  under NGS, but falls back to  when corrected.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4232844&req=5

fig-1: The stages of NGS nucleotide distribution error correction.(A) Error is introduced to the true nucleotide distribution p, giving distribution , the result of multinomial sampling with n (the coverage, the number of nucleotides sequenced) trials in which the true sequenced nucleotide distribution is given by q = Mp. The error is corrected by forming , the normalisation of . In practice M must be estimated (the calibration step), using a known initial distribution p. (B) The initial diversity H(p) increases to under NGS, but falls back to when corrected.

Mentions: The NGS error can be partly corrected by reversing the mutation process, first multiplying by M−1. In practice, thanks to the sampling variation introduced during NGS, the reversal can yield negative components. For this reason we must bring these back to zero and then normalize so that the nucleotide proportions sum to one. Thus, the corrected nucleotide distribution is the component-wise maximum of and (0, 0, 0, 0), denoted , normalised, so . Since , this transformation has the effect of moving all probabilities away from 0.25, so decreasing diversity. Only when does this reversal process work exactly, returning p. Figure 1A lays out these stages, from p, the true nucleotide distribution in the organism, to the distribution following NGS, to , the corrected distribution. To illustrate this numerically, with p = (1, 0, 0, 0), α = 0.001 and n = 10,000 then (n1, n2, n3, n4) could be (9989, 4, 3, 4) (using a multinomial distribution) whence , , , and . In this example the correction is exact, but this is not always the case, whence the estimate will still be subject to error.


Error correction and diversity analysis of population mixtures determined by NGS.

Wood GR, Burroughs NJ, Evans DJ, Ryabov EV - PeerJ (2014)

The stages of NGS nucleotide distribution error correction.(A) Error is introduced to the true nucleotide distribution p, giving distribution , the result of multinomial sampling with n (the coverage, the number of nucleotides sequenced) trials in which the true sequenced nucleotide distribution is given by q = Mp. The error is corrected by forming , the normalisation of . In practice M must be estimated (the calibration step), using a known initial distribution p. (B) The initial diversity H(p) increases to  under NGS, but falls back to  when corrected.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4232844&req=5

fig-1: The stages of NGS nucleotide distribution error correction.(A) Error is introduced to the true nucleotide distribution p, giving distribution , the result of multinomial sampling with n (the coverage, the number of nucleotides sequenced) trials in which the true sequenced nucleotide distribution is given by q = Mp. The error is corrected by forming , the normalisation of . In practice M must be estimated (the calibration step), using a known initial distribution p. (B) The initial diversity H(p) increases to under NGS, but falls back to when corrected.
Mentions: The NGS error can be partly corrected by reversing the mutation process, first multiplying by M−1. In practice, thanks to the sampling variation introduced during NGS, the reversal can yield negative components. For this reason we must bring these back to zero and then normalize so that the nucleotide proportions sum to one. Thus, the corrected nucleotide distribution is the component-wise maximum of and (0, 0, 0, 0), denoted , normalised, so . Since , this transformation has the effect of moving all probabilities away from 0.25, so decreasing diversity. Only when does this reversal process work exactly, returning p. Figure 1A lays out these stages, from p, the true nucleotide distribution in the organism, to the distribution following NGS, to , the corrected distribution. To illustrate this numerically, with p = (1, 0, 0, 0), α = 0.001 and n = 10,000 then (n1, n2, n3, n4) could be (9989, 4, 3, 4) (using a multinomial distribution) whence , , , and . In this example the correction is exact, but this is not always the case, whence the estimate will still be subject to error.

Bottom Line: The impetus for this work was the need to analyse nucleotide diversity in a viral mix taken from honeybees.The paper has two findings.A compendium of existing and new diversity analysis tools is also presented, allowing hypotheses about diversity and mean diversity to be tested and associated confidence intervals to be calculated.

View Article: PubMed Central - HTML - PubMed

Affiliation: Warwick Systems Biology Centre, University of Warwick , Coventry , United Kingdom.

ABSTRACT
The impetus for this work was the need to analyse nucleotide diversity in a viral mix taken from honeybees. The paper has two findings. First, a method for correction of next generation sequencing error in the distribution of nucleotides at a site is developed. Second, a package of methods for assessment of nucleotide diversity is assembled. The error correction method is statistically based and works at the level of the nucleotide distribution rather than the level of individual nucleotides. The method relies on an error model and a sample of known viral genotypes that is used for model calibration. A compendium of existing and new diversity analysis tools is also presented, allowing hypotheses about diversity and mean diversity to be tested and associated confidence intervals to be calculated. The methods are illustrated using honeybee viral samples. Software in both Excel and Matlab and a guide are available at http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/, the Warwick University Systems Biology Centre software download site.

No MeSH data available.


Related in: MedlinePlus