Limits...
FlowClus: efficiently filtering and denoising pyrosequenced amplicons.

Gaspar JM, Thomas WK - BMC Bioinformatics (2015)

Bottom Line: When used to analyze a mock community dataset, FlowClus produced a lower error rate compared to other denoising algorithms, while retaining significantly more sequence information.Many of the amplicon-based metagenomics datasets generated over the last several years have been processed through a denoising pipeline that likely caused deleterious effects on the raw data.Because of its efficiency, FlowClus can be used to re-analyze multiple large datasets together, thereby leading to more standardized conclusions.

View Article: PubMed Central - PubMed

Affiliation: Department of Molecular Cellular & Biomedical Sciences, University of New Hampshire, Durham, NH, USA. jsh58@unh.edu.

ABSTRACT

Background: Reducing the effects of sequencing errors and PCR artifacts has emerged as an essential component in amplicon-based metagenomic studies. Denoising algorithms have been designed that can reduce error rates in mock community data, but they change the sequence data in a manner that can be inconsistent with the process of removing errors in studies of real communities. In addition, they are limited by the size of the dataset and the sequencing technology used.

Results: FlowClus uses a systematic approach to filter and denoise reads efficiently. When denoising real datasets, FlowClus provides feedback about the process that can be used as the basis to adjust the parameters of the algorithm to suit the particular dataset. When used to analyze a mock community dataset, FlowClus produced a lower error rate compared to other denoising algorithms, while retaining significantly more sequence information. Among its other attributes, FlowClus can analyze longer reads being generated from all stages of 454 sequencing technology, as well as from Ion Torrent. It has processed a large dataset of 2.2 million GS-FLX Titanium reads in twelve hours; using its more efficient (but less precise) trie analysis option, this time was further reduced, to seven minutes.

Conclusions: Many of the amplicon-based metagenomics datasets generated over the last several years have been processed through a denoising pipeline that likely caused deleterious effects on the raw data. By using FlowClus, one can avoid such negative outcomes while maintaining control over the filtering and denoising processes. Because of its efficiency, FlowClus can be used to re-analyze multiple large datasets together, thereby leading to more standardized conclusions. FlowClus is freely available on GitHub (jsh58/FlowClus); it is written in C and supported on Linux.

Show MeSH
Analyzing a mock community dataset. A comparison of the error rates (solid lines) and total sequence alignment length (dashed lines) of the Titanium mock community dataset (Quince et al. [6]) analyzed by FlowClus and AmpliconNoise.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4380255&req=5

Fig3: Analyzing a mock community dataset. A comparison of the error rates (solid lines) and total sequence alignment length (dashed lines) of the Titanium mock community dataset (Quince et al. [6]) analyzed by FlowClus and AmpliconNoise.

Mentions: We examined the performance of FlowClus in correcting pyrosequencing errors in the Titanium mock community dataset of Quince et al. [6]. The set of original reads (“Stage 0”) was determined by filtering only for mid tag and primer sequences. The combined insertion and deletion error rate of these reads was just over 0.4% (Figure 3). We then filtered the reads with FlowClus using criteria similar to those recommended with the QIIME denoising pipeline. This resulted in a drop in the error rate by more than half, while losing 11.5% of the sequence information. We denoised the reads by clustering, using a constant 0.90 as the denoising distance, which was the largest value that did not cause a significant (>5%) change in the substitution error rate. After denoising, the in/del error rate was further reduced, to less than 0.1%.Figure 3


FlowClus: efficiently filtering and denoising pyrosequenced amplicons.

Gaspar JM, Thomas WK - BMC Bioinformatics (2015)

Analyzing a mock community dataset. A comparison of the error rates (solid lines) and total sequence alignment length (dashed lines) of the Titanium mock community dataset (Quince et al. [6]) analyzed by FlowClus and AmpliconNoise.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4380255&req=5

Fig3: Analyzing a mock community dataset. A comparison of the error rates (solid lines) and total sequence alignment length (dashed lines) of the Titanium mock community dataset (Quince et al. [6]) analyzed by FlowClus and AmpliconNoise.
Mentions: We examined the performance of FlowClus in correcting pyrosequencing errors in the Titanium mock community dataset of Quince et al. [6]. The set of original reads (“Stage 0”) was determined by filtering only for mid tag and primer sequences. The combined insertion and deletion error rate of these reads was just over 0.4% (Figure 3). We then filtered the reads with FlowClus using criteria similar to those recommended with the QIIME denoising pipeline. This resulted in a drop in the error rate by more than half, while losing 11.5% of the sequence information. We denoised the reads by clustering, using a constant 0.90 as the denoising distance, which was the largest value that did not cause a significant (>5%) change in the substitution error rate. After denoising, the in/del error rate was further reduced, to less than 0.1%.Figure 3

Bottom Line: When used to analyze a mock community dataset, FlowClus produced a lower error rate compared to other denoising algorithms, while retaining significantly more sequence information.Many of the amplicon-based metagenomics datasets generated over the last several years have been processed through a denoising pipeline that likely caused deleterious effects on the raw data.Because of its efficiency, FlowClus can be used to re-analyze multiple large datasets together, thereby leading to more standardized conclusions.

View Article: PubMed Central - PubMed

Affiliation: Department of Molecular Cellular & Biomedical Sciences, University of New Hampshire, Durham, NH, USA. jsh58@unh.edu.

ABSTRACT

Background: Reducing the effects of sequencing errors and PCR artifacts has emerged as an essential component in amplicon-based metagenomic studies. Denoising algorithms have been designed that can reduce error rates in mock community data, but they change the sequence data in a manner that can be inconsistent with the process of removing errors in studies of real communities. In addition, they are limited by the size of the dataset and the sequencing technology used.

Results: FlowClus uses a systematic approach to filter and denoise reads efficiently. When denoising real datasets, FlowClus provides feedback about the process that can be used as the basis to adjust the parameters of the algorithm to suit the particular dataset. When used to analyze a mock community dataset, FlowClus produced a lower error rate compared to other denoising algorithms, while retaining significantly more sequence information. Among its other attributes, FlowClus can analyze longer reads being generated from all stages of 454 sequencing technology, as well as from Ion Torrent. It has processed a large dataset of 2.2 million GS-FLX Titanium reads in twelve hours; using its more efficient (but less precise) trie analysis option, this time was further reduced, to seven minutes.

Conclusions: Many of the amplicon-based metagenomics datasets generated over the last several years have been processed through a denoising pipeline that likely caused deleterious effects on the raw data. By using FlowClus, one can avoid such negative outcomes while maintaining control over the filtering and denoising processes. Because of its efficiency, FlowClus can be used to re-analyze multiple large datasets together, thereby leading to more standardized conclusions. FlowClus is freely available on GitHub (jsh58/FlowClus); it is written in C and supported on Linux.

Show MeSH