Limits...
Massively parallel multiplex DNA sequencing for specimen identification using an Illumina MiSeq platform.

Shokralla S, Porter TM, Gibson JF, Dobosz R, Janzen DH, Hallwachs W, Golding GB, Hajibabaei M - Sci Rep (2015)

Bottom Line: Here, we present a scalable double dual-indexing approach using an Illumina Miseq platform to sequence DNA barcode markers.We achieved 97.3% success by using half of an Illumina Miseq flowcell to obtain 658 base pairs of the cytochrome c oxidase I DNA barcode in 1,010 specimens from eleven orders of arthropods.Our approach recovers a greater proportion of DNA barcode sequences from individuals than does conventional Sanger sequencing, while at the same time reducing both per specimen costs and labor time by nearly 80%.

View Article: PubMed Central - PubMed

Affiliation: Department of Integrative Biology and Biodiversity Institute of Ontario, University of Guelph, 50 Stone Road East, Guelph, ON, Canada N1G 2W1.

ABSTRACT
Genetic information is a valuable component of biosystematics, especially specimen identification through the use of species-specific DNA barcodes. Although many genomics applications have shifted to High-Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies, sample identification (e.g., via DNA barcoding) is still most often done with Sanger sequencing. Here, we present a scalable double dual-indexing approach using an Illumina Miseq platform to sequence DNA barcode markers. We achieved 97.3% success by using half of an Illumina Miseq flowcell to obtain 658 base pairs of the cytochrome c oxidase I DNA barcode in 1,010 specimens from eleven orders of arthropods. Our approach recovers a greater proportion of DNA barcode sequences from individuals than does conventional Sanger sequencing, while at the same time reducing both per specimen costs and labor time by nearly 80%. In addition, the use of HTS allows the recovery of multiple sequences per specimen, for deeper analysis of genetic variation in target gene regions.

No MeSH data available.


Related in: MedlinePlus

Results of both Sanger and Illumina MiSeq sequencing of 1,010 individual arthropods from a single Malaise trap sample.(A) Overall success of generating COI DNA sequences via Sanger sequencing for each of eleven 96-well specimen plates. (B) Overall success of generating COI DNA sequences via Illumina MiSeq sequencing for each of eleven plates. For (A) and (B), number of individuals per plate producing a COI sequence are shaded dark below, with unsuccessful individuals above. (C) Number of unique COI DNA sequences produced via Illumina MiSeq sequencing for each individual.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4401116&req=5

f1: Results of both Sanger and Illumina MiSeq sequencing of 1,010 individual arthropods from a single Malaise trap sample.(A) Overall success of generating COI DNA sequences via Sanger sequencing for each of eleven 96-well specimen plates. (B) Overall success of generating COI DNA sequences via Illumina MiSeq sequencing for each of eleven plates. For (A) and (B), number of individuals per plate producing a COI sequence are shaded dark below, with unsuccessful individuals above. (C) Number of unique COI DNA sequences produced via Illumina MiSeq sequencing for each individual.

Mentions: The standard 5′ end of the COI region was amplified for each individual DNA template using the primers LCO1490 and HCO219827. These amplicons were sequenced via standard Sanger protocols. A total of 537 individuals (53.2%) produced a full-length (>500 bp) sequence via Sanger sequencing (Fig. 1A). Sanger sequencing success ranged from 12.0% (plate 9) up to 91.3% (plate 4). A total of 983 individuals (97.3%) produced at least one full-length sequence via Illumina MiSeq sequencing (Fig. 1B). The same region of COI was amplified for all individual DNA templates in two smaller, overlapping fragments using Ill_LCO1490 x Ill_C_R and Ill_B_F x Ill_HCO2198 primer sets respectively. The two fragments overlap by 82 bp. All generated amplicons were dual indexed with unique 5-mer multiple identifiers (MIDs) from both directions. The generated amplicons were pooled in groups and re-dual indexed and sequenced on half of a single Illumina MiSeq flowcell using a V3 Miseq sequencing kit (300 bp × 2). A total of 18,873,718 Illumina paired-end reads were filtered for quality and length. Across each of the eleven 96-well plates, a total of 10,480,349 raw FC fragment reads were Illumina paired-end sequenced (mean - 952,759 reads per plate) and a total of 8,393,369 raw BR fragment reads were sequenced (mean - 763,034 reads per plate). For each of the eleven plates, the raw paired-end reads for the FC fragment and, separately for the BR fragment, were merged with a minimum overlap of 25 bp. A total of 9,652,825 paired FC reads (mean - 877,530 paired reads per plate) and a total of 6,020,424 paired BR reads (mean 547,311 paired reads per plate) were retained for further processing. After MID sorting and primer trimming, putative chimeric sequences were removed along with identical duplicate sequences using a 99% sequence similarity cutoff. The two fragments of each individual were paired, requiring a minimum of 80 bp overlap; a maximum of 0.02 (2%) mismatches were allowed in the overlap region. An average of 5,868 (range 5,166 – 6,577) full-length sequences were produced for each individual. Following de-replication of identical sequences, the number of unique, abundant sequences (>10% of total sequences per individual) recovered for each individual ranged from zero to six. Illumina MiSeq sequencing success ranged from 92.2% (plate 10) up to 100% (plates 2, 4, and 8). A total of 794 individuals (78.6%) produced exactly one unique full-length assembled COI sequence via Illumina MiSeq sequencing (Fig. 1C).


Massively parallel multiplex DNA sequencing for specimen identification using an Illumina MiSeq platform.

Shokralla S, Porter TM, Gibson JF, Dobosz R, Janzen DH, Hallwachs W, Golding GB, Hajibabaei M - Sci Rep (2015)

Results of both Sanger and Illumina MiSeq sequencing of 1,010 individual arthropods from a single Malaise trap sample.(A) Overall success of generating COI DNA sequences via Sanger sequencing for each of eleven 96-well specimen plates. (B) Overall success of generating COI DNA sequences via Illumina MiSeq sequencing for each of eleven plates. For (A) and (B), number of individuals per plate producing a COI sequence are shaded dark below, with unsuccessful individuals above. (C) Number of unique COI DNA sequences produced via Illumina MiSeq sequencing for each individual.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4401116&req=5

f1: Results of both Sanger and Illumina MiSeq sequencing of 1,010 individual arthropods from a single Malaise trap sample.(A) Overall success of generating COI DNA sequences via Sanger sequencing for each of eleven 96-well specimen plates. (B) Overall success of generating COI DNA sequences via Illumina MiSeq sequencing for each of eleven plates. For (A) and (B), number of individuals per plate producing a COI sequence are shaded dark below, with unsuccessful individuals above. (C) Number of unique COI DNA sequences produced via Illumina MiSeq sequencing for each individual.
Mentions: The standard 5′ end of the COI region was amplified for each individual DNA template using the primers LCO1490 and HCO219827. These amplicons were sequenced via standard Sanger protocols. A total of 537 individuals (53.2%) produced a full-length (>500 bp) sequence via Sanger sequencing (Fig. 1A). Sanger sequencing success ranged from 12.0% (plate 9) up to 91.3% (plate 4). A total of 983 individuals (97.3%) produced at least one full-length sequence via Illumina MiSeq sequencing (Fig. 1B). The same region of COI was amplified for all individual DNA templates in two smaller, overlapping fragments using Ill_LCO1490 x Ill_C_R and Ill_B_F x Ill_HCO2198 primer sets respectively. The two fragments overlap by 82 bp. All generated amplicons were dual indexed with unique 5-mer multiple identifiers (MIDs) from both directions. The generated amplicons were pooled in groups and re-dual indexed and sequenced on half of a single Illumina MiSeq flowcell using a V3 Miseq sequencing kit (300 bp × 2). A total of 18,873,718 Illumina paired-end reads were filtered for quality and length. Across each of the eleven 96-well plates, a total of 10,480,349 raw FC fragment reads were Illumina paired-end sequenced (mean - 952,759 reads per plate) and a total of 8,393,369 raw BR fragment reads were sequenced (mean - 763,034 reads per plate). For each of the eleven plates, the raw paired-end reads for the FC fragment and, separately for the BR fragment, were merged with a minimum overlap of 25 bp. A total of 9,652,825 paired FC reads (mean - 877,530 paired reads per plate) and a total of 6,020,424 paired BR reads (mean 547,311 paired reads per plate) were retained for further processing. After MID sorting and primer trimming, putative chimeric sequences were removed along with identical duplicate sequences using a 99% sequence similarity cutoff. The two fragments of each individual were paired, requiring a minimum of 80 bp overlap; a maximum of 0.02 (2%) mismatches were allowed in the overlap region. An average of 5,868 (range 5,166 – 6,577) full-length sequences were produced for each individual. Following de-replication of identical sequences, the number of unique, abundant sequences (>10% of total sequences per individual) recovered for each individual ranged from zero to six. Illumina MiSeq sequencing success ranged from 92.2% (plate 10) up to 100% (plates 2, 4, and 8). A total of 794 individuals (78.6%) produced exactly one unique full-length assembled COI sequence via Illumina MiSeq sequencing (Fig. 1C).

Bottom Line: Here, we present a scalable double dual-indexing approach using an Illumina Miseq platform to sequence DNA barcode markers.We achieved 97.3% success by using half of an Illumina Miseq flowcell to obtain 658 base pairs of the cytochrome c oxidase I DNA barcode in 1,010 specimens from eleven orders of arthropods.Our approach recovers a greater proportion of DNA barcode sequences from individuals than does conventional Sanger sequencing, while at the same time reducing both per specimen costs and labor time by nearly 80%.

View Article: PubMed Central - PubMed

Affiliation: Department of Integrative Biology and Biodiversity Institute of Ontario, University of Guelph, 50 Stone Road East, Guelph, ON, Canada N1G 2W1.

ABSTRACT
Genetic information is a valuable component of biosystematics, especially specimen identification through the use of species-specific DNA barcodes. Although many genomics applications have shifted to High-Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies, sample identification (e.g., via DNA barcoding) is still most often done with Sanger sequencing. Here, we present a scalable double dual-indexing approach using an Illumina Miseq platform to sequence DNA barcode markers. We achieved 97.3% success by using half of an Illumina Miseq flowcell to obtain 658 base pairs of the cytochrome c oxidase I DNA barcode in 1,010 specimens from eleven orders of arthropods. Our approach recovers a greater proportion of DNA barcode sequences from individuals than does conventional Sanger sequencing, while at the same time reducing both per specimen costs and labor time by nearly 80%. In addition, the use of HTS allows the recovery of multiple sequences per specimen, for deeper analysis of genetic variation in target gene regions.

No MeSH data available.


Related in: MedlinePlus