Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.
Bottom Line: A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions.Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin.Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%.
Affiliation: School of Engineering, University of Glasgow, Glasgow, UK email@example.com.Show MeSH
Mentions: Figure 2 displays the position-specific insertion and deletion profiles as well as the distribution of unknown nucleotides (Ns) across all reads. As previously reported the insertion and deletions (indel) rates are ≈100× lower than the substitution rates. We also observed that insertions with rates of 0.000040 and 0.000043 for R1 and R2 reads, respectively, are twice as likely as deletions for which we observed rates of 0.000017 and 0.000027 for R1 and R2 reads, respectively. Again, the majority of indels seem to concentrate around certain positions with rates up to 225× higher than the average indel rate (see Table 2). The non-uniform distributions of unknown nucleotides (N) indicate that Ns as well do not occur randomly.
Affiliation: School of Engineering, University of Glasgow, Glasgow, UK firstname.lastname@example.org.