Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.
Bottom Line: A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions.Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin.Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%.
Affiliation: School of Engineering, University of Glasgow, Glasgow, UK email@example.com.Show MeSH
Mentions: By overlapping the reads we were able to achieve further significant improvements with regard to the error rates. The best results in terms of error removal were achieved with a combination of quality trimming the reads with Sickle, then applying BayesHammer for error correction and then overlapping the reads with PANDAseq (see Supplementary Figure S11 for details on PANDAseq). For the data sets displayed in Figure 10 the substitution error rates were reduced by 77–98% with an average of 93.2%. Figure 11 compares the percentage of aligned reads for the most successful approaches. PANDAseq was able to align between 12 and 95% of the reads with an average of 69% across all data sets.
Affiliation: School of Engineering, University of Glasgow, Glasgow, UK firstname.lastname@example.org.