Limits...
AmpliconDuo: A Split-Sample Filtering Protocol for High-Throughput Amplicon Sequencing of Microbial Communities.

Lange A, Jost S, Heider D, Bock C, Budeus B, Schilling E, Strittmatter A, Boenigk J, Hoffmann D - PLoS ONE (2015)

Bottom Line: Further, we discard sequences that are not found in both branches ("AmpliconDuo filter").The filter does not distort overall apparent community compositions.Finally, we quantitatively explain the effect of the AmpliconDuo filter by a simple mathematical model.

View Article: PubMed Central - PubMed

Affiliation: Research Group Bioinformatics, Faculty of Biology, University of Duisburg-Essen, Essen, Germany.

ABSTRACT
High throughput sequencing (HTSeq) of small ribosomal subunit amplicons has the potential for a comprehensive characterization of microbial community compositions, down to rare species. However, the error-prone nature of the multi-step experimental process requires that the resulting raw sequences are subjected to quality control procedures. These procedures often involve an abundance cutoff for rare sequences or clustering of sequences, both of which limit genetic resolution. Here we propose a simple experimental protocol that retains the high genetic resolution granted by HTSeq methods while effectively removing many low abundance sequences that are likely due to PCR and sequencing errors. According to this protocol, we split samples and submit both halves to independent PCR and sequencing runs. The resulting sequence data is graphically and quantitatively characterized by the discordance between the two experimental branches, allowing for a quick identification of problematic samples. Further, we discard sequences that are not found in both branches ("AmpliconDuo filter"). We show that the majority of sequences removed in this way, mostly low abundance but also some higher abundance sequences, show features expected from random modifications of true sequences as introduced by PCR and sequencing errors. On the other hand, the filter retains many low abundance sequences observed in both branches and thus provides a more reliable census of the rare biosphere. We find that the AmpliconDuo filter increases biological resolution as it increases apparent community similarity between biologically similar communities, while it does not affect apparent community similarities between biologically dissimilar communities. The filter does not distort overall apparent community compositions. Finally, we quantitatively explain the effect of the AmpliconDuo filter by a simple mathematical model.

No MeSH data available.


Distribution of probability part of artificial random mutations.Each dot corresponds to one part value computed for one experimental branch A or B according to Eq (6). In the plot, part values are binned in intervals of 1/30 of their total range. Eukaryotes and metazoans (first two columns) have both been analyzed with the same single-read protocol, and the mean part of these two groups are not significantly different. For the prokaryotic samples that have been analyzed with a paired-end protocol, we have a higher part.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4629888&req=5

pone.0141590.g005: Distribution of probability part of artificial random mutations.Each dot corresponds to one part value computed for one experimental branch A or B according to Eq (6). In the plot, part values are binned in intervals of 1/30 of their total range. Eukaryotes and metazoans (first two columns) have both been analyzed with the same single-read protocol, and the mean part of these two groups are not significantly different. For the prokaryotic samples that have been analyzed with a paired-end protocol, we have a higher part.

Mentions: If our model is correct, part will mainly depend on the experimental protocol, i.e. it will be more or less the same for different samples as long as the same experimental protocol is used. Fig 5 supports this model: For each of the two protocols we can estimate a mean part with 95% confidence intervals that do not overlap: For the single-read protocol (eukaryotes) we have part = (1.9 ± 0.4) × 10−4. For the paired-end protocol (prokaryotes) we have part = (3.3 ± 0.7) × 10−4. A Brunner-Munzel test between the two part distributions yields a p-value of 5.7 × 10−5 for the hypothesis of equal means of the two distributions. Cohen’s d is estimated at 1.79 ± 1.1, indicating a medium to large effect of the experimental protocol on part.


AmpliconDuo: A Split-Sample Filtering Protocol for High-Throughput Amplicon Sequencing of Microbial Communities.

Lange A, Jost S, Heider D, Bock C, Budeus B, Schilling E, Strittmatter A, Boenigk J, Hoffmann D - PLoS ONE (2015)

Distribution of probability part of artificial random mutations.Each dot corresponds to one part value computed for one experimental branch A or B according to Eq (6). In the plot, part values are binned in intervals of 1/30 of their total range. Eukaryotes and metazoans (first two columns) have both been analyzed with the same single-read protocol, and the mean part of these two groups are not significantly different. For the prokaryotic samples that have been analyzed with a paired-end protocol, we have a higher part.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4629888&req=5

pone.0141590.g005: Distribution of probability part of artificial random mutations.Each dot corresponds to one part value computed for one experimental branch A or B according to Eq (6). In the plot, part values are binned in intervals of 1/30 of their total range. Eukaryotes and metazoans (first two columns) have both been analyzed with the same single-read protocol, and the mean part of these two groups are not significantly different. For the prokaryotic samples that have been analyzed with a paired-end protocol, we have a higher part.
Mentions: If our model is correct, part will mainly depend on the experimental protocol, i.e. it will be more or less the same for different samples as long as the same experimental protocol is used. Fig 5 supports this model: For each of the two protocols we can estimate a mean part with 95% confidence intervals that do not overlap: For the single-read protocol (eukaryotes) we have part = (1.9 ± 0.4) × 10−4. For the paired-end protocol (prokaryotes) we have part = (3.3 ± 0.7) × 10−4. A Brunner-Munzel test between the two part distributions yields a p-value of 5.7 × 10−5 for the hypothesis of equal means of the two distributions. Cohen’s d is estimated at 1.79 ± 1.1, indicating a medium to large effect of the experimental protocol on part.

Bottom Line: Further, we discard sequences that are not found in both branches ("AmpliconDuo filter").The filter does not distort overall apparent community compositions.Finally, we quantitatively explain the effect of the AmpliconDuo filter by a simple mathematical model.

View Article: PubMed Central - PubMed

Affiliation: Research Group Bioinformatics, Faculty of Biology, University of Duisburg-Essen, Essen, Germany.

ABSTRACT
High throughput sequencing (HTSeq) of small ribosomal subunit amplicons has the potential for a comprehensive characterization of microbial community compositions, down to rare species. However, the error-prone nature of the multi-step experimental process requires that the resulting raw sequences are subjected to quality control procedures. These procedures often involve an abundance cutoff for rare sequences or clustering of sequences, both of which limit genetic resolution. Here we propose a simple experimental protocol that retains the high genetic resolution granted by HTSeq methods while effectively removing many low abundance sequences that are likely due to PCR and sequencing errors. According to this protocol, we split samples and submit both halves to independent PCR and sequencing runs. The resulting sequence data is graphically and quantitatively characterized by the discordance between the two experimental branches, allowing for a quick identification of problematic samples. Further, we discard sequences that are not found in both branches ("AmpliconDuo filter"). We show that the majority of sequences removed in this way, mostly low abundance but also some higher abundance sequences, show features expected from random modifications of true sequences as introduced by PCR and sequencing errors. On the other hand, the filter retains many low abundance sequences observed in both branches and thus provides a more reliable census of the rare biosphere. We find that the AmpliconDuo filter increases biological resolution as it increases apparent community similarity between biologically similar communities, while it does not affect apparent community similarities between biologically dissimilar communities. The filter does not distort overall apparent community compositions. Finally, we quantitatively explain the effect of the AmpliconDuo filter by a simple mathematical model.

No MeSH data available.