Limits...
Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.

Schirmer M, Ijaz UZ, D'Amore R, Hall N, Sloan WT, Quince C - Nucleic Acids Res. (2015)

Bottom Line: A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions.We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns.Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%.

View Article: PubMed Central - PubMed

Affiliation: School of Engineering, University of Glasgow, Glasgow, UK mail@melanieschirmer.com.

Show MeSH
Error profiles for insertions, deletions and unknown nucleotides (Ns): the first three graphs show the R1 error profiles. For insertions the colour identifies the inserted nucleotide and for deletions the colour refers to the type of nucleotide that was deleted. The lower three graphs display the error profiles for the R2 reads, respectively.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4381044&req=5

Figure 2: Error profiles for insertions, deletions and unknown nucleotides (Ns): the first three graphs show the R1 error profiles. For insertions the colour identifies the inserted nucleotide and for deletions the colour refers to the type of nucleotide that was deleted. The lower three graphs display the error profiles for the R2 reads, respectively.

Mentions: Figure 2 displays the position-specific insertion and deletion profiles as well as the distribution of unknown nucleotides (Ns) across all reads. As previously reported the insertion and deletions (indel) rates are ≈100× lower than the substitution rates. We also observed that insertions with rates of 0.000040 and 0.000043 for R1 and R2 reads, respectively, are twice as likely as deletions for which we observed rates of 0.000017 and 0.000027 for R1 and R2 reads, respectively. Again, the majority of indels seem to concentrate around certain positions with rates up to 225× higher than the average indel rate (see Table 2). The non-uniform distributions of unknown nucleotides (N) indicate that Ns as well do not occur randomly.


Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.

Schirmer M, Ijaz UZ, D'Amore R, Hall N, Sloan WT, Quince C - Nucleic Acids Res. (2015)

Error profiles for insertions, deletions and unknown nucleotides (Ns): the first three graphs show the R1 error profiles. For insertions the colour identifies the inserted nucleotide and for deletions the colour refers to the type of nucleotide that was deleted. The lower three graphs display the error profiles for the R2 reads, respectively.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4381044&req=5

Figure 2: Error profiles for insertions, deletions and unknown nucleotides (Ns): the first three graphs show the R1 error profiles. For insertions the colour identifies the inserted nucleotide and for deletions the colour refers to the type of nucleotide that was deleted. The lower three graphs display the error profiles for the R2 reads, respectively.
Mentions: Figure 2 displays the position-specific insertion and deletion profiles as well as the distribution of unknown nucleotides (Ns) across all reads. As previously reported the insertion and deletions (indel) rates are ≈100× lower than the substitution rates. We also observed that insertions with rates of 0.000040 and 0.000043 for R1 and R2 reads, respectively, are twice as likely as deletions for which we observed rates of 0.000017 and 0.000027 for R1 and R2 reads, respectively. Again, the majority of indels seem to concentrate around certain positions with rates up to 225× higher than the average indel rate (see Table 2). The non-uniform distributions of unknown nucleotides (N) indicate that Ns as well do not occur randomly.

Bottom Line: A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions.We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns.Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%.

View Article: PubMed Central - PubMed

Affiliation: School of Engineering, University of Glasgow, Glasgow, UK mail@melanieschirmer.com.

Show MeSH