Limits...
Increased efficiency in identifying mixed pollen samples by meta-barcoding with a dual-indexing approach.

Sickel W, Ankenbrand MJ, Grimmer G, Holzschuh A, Härtel S, Lanzen J, Steffan-Dewenter I, Keller A - BMC Ecol. (2015)

Bottom Line: It does not require further adapter ligation steps after amplification.This study thus offers improvements for the laboratory and bioinformatical workflow to existing approaches regarding data quantity and quality as well as processing effort and cost-effectiveness.Although only tested for pollen samples, it is furthermore applicable to other research questions requiring plant identification in mixed and challenging samples.

View Article: PubMed Central - PubMed

Affiliation: Department of Animal Ecology and Tropical Biology, Biocenter, University of Würzburg, Am Hubland, 97074, Würzburg, Germany. wiebke.sickel@uni-wuerzburg.de.

ABSTRACT

Background: Meta-barcoding of mixed pollen samples constitutes a suitable alternative to conventional pollen identification via light microscopy. Current approaches however have limitations in practicability due to low sample throughput and/or inefficient processing methods, e.g. separate steps for amplification and sample indexing.

Results: We thus developed a new primer-adapter design for high throughput sequencing with the Illumina technology that remedies these issues. It uses a dual-indexing strategy, where sample-specific combinations of forward and reverse identifiers attached to the barcode marker allow high sample throughput with a single sequencing run. It does not require further adapter ligation steps after amplification. We applied this protocol to 384 pollen samples collected by solitary bees and sequenced all samples together on a single Illumina MiSeq v2 flow cell. According to rarefaction curves, 2,000-3,000 high quality reads per sample were sufficient to assess the complete diversity of 95% of the samples. We were able to detect 650 different plant taxa in total, of which 95% were classified at the species level. Together with the laboratory protocol, we also present an update of the reference database used by the classifier software, which increases the total number of covered global plant species included in the database from 37,403 to 72,325 (93% increase).

Conclusions: This study thus offers improvements for the laboratory and bioinformatical workflow to existing approaches regarding data quantity and quality as well as processing effort and cost-effectiveness. Although only tested for pollen samples, it is furthermore applicable to other research questions requiring plant identification in mixed and challenging samples.

Show MeSH
Species accumulation curves. aOsmia bicornis samples; bOsmia truncorum samples. The x-axis was limited to 5,000 reads as the saturation of all samples was below this threshold. The y-axis was limited to 90 taxa in both plots to obtain the same scale. Taxa accounting for less than 0.1% of total sample reads were excluded.
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4509727&req=5

Fig2: Species accumulation curves. aOsmia bicornis samples; bOsmia truncorum samples. The x-axis was limited to 5,000 reads as the saturation of all samples was below this threshold. The y-axis was limited to 90 taxa in both plots to obtain the same scale. Taxa accounting for less than 0.1% of total sample reads were excluded.

Mentions: In total we obtained 11,624,087 raw ITS2 reads (PhiX excluded), which accounted for an average of 30,271 [standard deviation (SD): 11,373; median: 30,900] reads per sample. After data processing (low-quality <Q20, short reads <150 bp, ambiguous base-pairs), a mean of 15,580 (SD 6,598; median 15,740) reads per sample remained. Species accumulation curves (Figure 2) show that almost all samples were sequenced to saturation after approximately 2,000–3,000 high quality reads. Based on the ratio of raw to high quality reads, this accounts for approximately 4,000–6,000 raw reads required. Per sample pollen in bee brood cells originated from between one and 85 different plant species (Figure 2). Five per cent of samples (19) yielded an output of less than 2,000 reads (minimum saturation threshold, Figure 2), which were removed prior to further analysis. Raw sequences are accessible via the EBI-SRA with the project accession number PRJEB8640.Figure 2


Increased efficiency in identifying mixed pollen samples by meta-barcoding with a dual-indexing approach.

Sickel W, Ankenbrand MJ, Grimmer G, Holzschuh A, Härtel S, Lanzen J, Steffan-Dewenter I, Keller A - BMC Ecol. (2015)

Species accumulation curves. aOsmia bicornis samples; bOsmia truncorum samples. The x-axis was limited to 5,000 reads as the saturation of all samples was below this threshold. The y-axis was limited to 90 taxa in both plots to obtain the same scale. Taxa accounting for less than 0.1% of total sample reads were excluded.
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4509727&req=5

Fig2: Species accumulation curves. aOsmia bicornis samples; bOsmia truncorum samples. The x-axis was limited to 5,000 reads as the saturation of all samples was below this threshold. The y-axis was limited to 90 taxa in both plots to obtain the same scale. Taxa accounting for less than 0.1% of total sample reads were excluded.
Mentions: In total we obtained 11,624,087 raw ITS2 reads (PhiX excluded), which accounted for an average of 30,271 [standard deviation (SD): 11,373; median: 30,900] reads per sample. After data processing (low-quality <Q20, short reads <150 bp, ambiguous base-pairs), a mean of 15,580 (SD 6,598; median 15,740) reads per sample remained. Species accumulation curves (Figure 2) show that almost all samples were sequenced to saturation after approximately 2,000–3,000 high quality reads. Based on the ratio of raw to high quality reads, this accounts for approximately 4,000–6,000 raw reads required. Per sample pollen in bee brood cells originated from between one and 85 different plant species (Figure 2). Five per cent of samples (19) yielded an output of less than 2,000 reads (minimum saturation threshold, Figure 2), which were removed prior to further analysis. Raw sequences are accessible via the EBI-SRA with the project accession number PRJEB8640.Figure 2

Bottom Line: It does not require further adapter ligation steps after amplification.This study thus offers improvements for the laboratory and bioinformatical workflow to existing approaches regarding data quantity and quality as well as processing effort and cost-effectiveness.Although only tested for pollen samples, it is furthermore applicable to other research questions requiring plant identification in mixed and challenging samples.

View Article: PubMed Central - PubMed

Affiliation: Department of Animal Ecology and Tropical Biology, Biocenter, University of Würzburg, Am Hubland, 97074, Würzburg, Germany. wiebke.sickel@uni-wuerzburg.de.

ABSTRACT

Background: Meta-barcoding of mixed pollen samples constitutes a suitable alternative to conventional pollen identification via light microscopy. Current approaches however have limitations in practicability due to low sample throughput and/or inefficient processing methods, e.g. separate steps for amplification and sample indexing.

Results: We thus developed a new primer-adapter design for high throughput sequencing with the Illumina technology that remedies these issues. It uses a dual-indexing strategy, where sample-specific combinations of forward and reverse identifiers attached to the barcode marker allow high sample throughput with a single sequencing run. It does not require further adapter ligation steps after amplification. We applied this protocol to 384 pollen samples collected by solitary bees and sequenced all samples together on a single Illumina MiSeq v2 flow cell. According to rarefaction curves, 2,000-3,000 high quality reads per sample were sufficient to assess the complete diversity of 95% of the samples. We were able to detect 650 different plant taxa in total, of which 95% were classified at the species level. Together with the laboratory protocol, we also present an update of the reference database used by the classifier software, which increases the total number of covered global plant species included in the database from 37,403 to 72,325 (93% increase).

Conclusions: This study thus offers improvements for the laboratory and bioinformatical workflow to existing approaches regarding data quantity and quality as well as processing effort and cost-effectiveness. Although only tested for pollen samples, it is furthermore applicable to other research questions requiring plant identification in mixed and challenging samples.

Show MeSH