Limits...
Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.

Schirmer M, Ijaz UZ, D'Amore R, Hall N, Sloan WT, Quince C - Nucleic Acids Res. (2015)

Bottom Line: A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions.We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns.Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%.

View Article: PubMed Central - PubMed

Affiliation: School of Engineering, University of Glasgow, Glasgow, UK mail@melanieschirmer.com.

Show MeSH
The figure compares the error rates of the raw reads (R1+R2 rates) to different error corrections approaches including Trimming+BayesHammer, overlapping reads with PANDAseq and overlapping reads with PEAR. We only included data sets for which at least 1000 reads aligned for all methods. Data sets not included: 19–26, 52+53 (not enough raw R1 reads aligned), 39–45+47 (not enough raw R2 reads aligned).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4381044&req=5

Figure 10: The figure compares the error rates of the raw reads (R1+R2 rates) to different error corrections approaches including Trimming+BayesHammer, overlapping reads with PANDAseq and overlapping reads with PEAR. We only included data sets for which at least 1000 reads aligned for all methods. Data sets not included: 19–26, 52+53 (not enough raw R1 reads aligned), 39–45+47 (not enough raw R2 reads aligned).

Mentions: By overlapping the reads we were able to achieve further significant improvements with regard to the error rates. The best results in terms of error removal were achieved with a combination of quality trimming the reads with Sickle, then applying BayesHammer for error correction and then overlapping the reads with PANDAseq (see Supplementary Figure S11 for details on PANDAseq). For the data sets displayed in Figure 10 the substitution error rates were reduced by 77–98% with an average of 93.2%. Figure 11 compares the percentage of aligned reads for the most successful approaches. PANDAseq was able to align between 12 and 95% of the reads with an average of 69% across all data sets.


Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.

Schirmer M, Ijaz UZ, D'Amore R, Hall N, Sloan WT, Quince C - Nucleic Acids Res. (2015)

The figure compares the error rates of the raw reads (R1+R2 rates) to different error corrections approaches including Trimming+BayesHammer, overlapping reads with PANDAseq and overlapping reads with PEAR. We only included data sets for which at least 1000 reads aligned for all methods. Data sets not included: 19–26, 52+53 (not enough raw R1 reads aligned), 39–45+47 (not enough raw R2 reads aligned).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4381044&req=5

Figure 10: The figure compares the error rates of the raw reads (R1+R2 rates) to different error corrections approaches including Trimming+BayesHammer, overlapping reads with PANDAseq and overlapping reads with PEAR. We only included data sets for which at least 1000 reads aligned for all methods. Data sets not included: 19–26, 52+53 (not enough raw R1 reads aligned), 39–45+47 (not enough raw R2 reads aligned).
Mentions: By overlapping the reads we were able to achieve further significant improvements with regard to the error rates. The best results in terms of error removal were achieved with a combination of quality trimming the reads with Sickle, then applying BayesHammer for error correction and then overlapping the reads with PANDAseq (see Supplementary Figure S11 for details on PANDAseq). For the data sets displayed in Figure 10 the substitution error rates were reduced by 77–98% with an average of 93.2%. Figure 11 compares the percentage of aligned reads for the most successful approaches. PANDAseq was able to align between 12 and 95% of the reads with an average of 69% across all data sets.

Bottom Line: A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions.We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns.Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%.

View Article: PubMed Central - PubMed

Affiliation: School of Engineering, University of Glasgow, Glasgow, UK mail@melanieschirmer.com.

Show MeSH