Limits...
Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

Kanda K, Pflug JM, Sproul JS, Dasenko MA, Maddison DR - PLoS ONE (2015)

Bottom Line: Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens.Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments.A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples.

View Article: PubMed Central - PubMed

Affiliation: Department of Integrative Biology, Oregon State University, Corvallis, Oregon, United States of America.

ABSTRACT
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.

Show MeSH

Related in: MedlinePlus

A portion of the maximum likelihood tree of carabids for CAD with all contigs included from the de novo assembly.An example in which our BLAST searches for target genes within HTS museum specimen assemblies returned multiple contigs, however our filtering criteria failed to accept a best contig, despite two contigs falling in the prediction group (shown in pink), and being nearly identical to the PCR-based sequence of a conspecific specimen (Bembidion “Inuvik” 3984).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4696846&req=5

pone.0143929.g011: A portion of the maximum likelihood tree of carabids for CAD with all contigs included from the de novo assembly.An example in which our BLAST searches for target genes within HTS museum specimen assemblies returned multiple contigs, however our filtering criteria failed to accept a best contig, despite two contigs falling in the prediction group (shown in pink), and being nearly identical to the PCR-based sequence of a conspecific specimen (Bembidion “Inuvik” 3984).

Mentions: As noted above, for some carabid samples, de novo assemblies contained multiple contigs that BLASTed only to beetles and that contained no stop codons (S4 Table). The phylogenetic analyses containing all contigs (S5 Fig) show various patterns of relationships between the multiple contigs in a sample. In many cases in which there was more than one contig for a sample, the different contigs formed a clade in the maximum likelihood tree (e.g., Bembidion lachnophoroides 3022 for Topo, Fig 9). Some had contigs scattered around the tree, but with the contig chosen by our selection process falling where predicted (e.g., Bembidion lapponicum 3974 and B. lachnophoroides 3022 for 28S, Fig 10). For some samples, some of the contigs that were not chosen were not where predicted and were extremely divergent (see, for example, the B. lachnophoroides COI contigs in S5 Fig). A third pattern is shown by Bembidion “Inuvik” 3984 for CAD (Fig 11): the two contigs appeared in the tree exactly where predicted, but our selection process failed to choose one over the other, and thus de novo assembly for this sample for CAD was judged to be a failure. In a very few cases none of the contigs were inferred to be where predicted in the phylogeny, including the single chosen contig (see, for example, Bembidion sp. nr. transversale 3205 in ArgK, Fig 12). In most cases, however, the chosen contig was inferred to fall where predicted in the phylogeny, or at least not strongly supported to fall in a contradictory place (S7 Fig).


Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

Kanda K, Pflug JM, Sproul JS, Dasenko MA, Maddison DR - PLoS ONE (2015)

A portion of the maximum likelihood tree of carabids for CAD with all contigs included from the de novo assembly.An example in which our BLAST searches for target genes within HTS museum specimen assemblies returned multiple contigs, however our filtering criteria failed to accept a best contig, despite two contigs falling in the prediction group (shown in pink), and being nearly identical to the PCR-based sequence of a conspecific specimen (Bembidion “Inuvik” 3984).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4696846&req=5

pone.0143929.g011: A portion of the maximum likelihood tree of carabids for CAD with all contigs included from the de novo assembly.An example in which our BLAST searches for target genes within HTS museum specimen assemblies returned multiple contigs, however our filtering criteria failed to accept a best contig, despite two contigs falling in the prediction group (shown in pink), and being nearly identical to the PCR-based sequence of a conspecific specimen (Bembidion “Inuvik” 3984).
Mentions: As noted above, for some carabid samples, de novo assemblies contained multiple contigs that BLASTed only to beetles and that contained no stop codons (S4 Table). The phylogenetic analyses containing all contigs (S5 Fig) show various patterns of relationships between the multiple contigs in a sample. In many cases in which there was more than one contig for a sample, the different contigs formed a clade in the maximum likelihood tree (e.g., Bembidion lachnophoroides 3022 for Topo, Fig 9). Some had contigs scattered around the tree, but with the contig chosen by our selection process falling where predicted (e.g., Bembidion lapponicum 3974 and B. lachnophoroides 3022 for 28S, Fig 10). For some samples, some of the contigs that were not chosen were not where predicted and were extremely divergent (see, for example, the B. lachnophoroides COI contigs in S5 Fig). A third pattern is shown by Bembidion “Inuvik” 3984 for CAD (Fig 11): the two contigs appeared in the tree exactly where predicted, but our selection process failed to choose one over the other, and thus de novo assembly for this sample for CAD was judged to be a failure. In a very few cases none of the contigs were inferred to be where predicted in the phylogeny, including the single chosen contig (see, for example, Bembidion sp. nr. transversale 3205 in ArgK, Fig 12). In most cases, however, the chosen contig was inferred to fall where predicted in the phylogeny, or at least not strongly supported to fall in a contradictory place (S7 Fig).

Bottom Line: Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens.Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments.A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples.

View Article: PubMed Central - PubMed

Affiliation: Department of Integrative Biology, Oregon State University, Corvallis, Oregon, United States of America.

ABSTRACT
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.

Show MeSH
Related in: MedlinePlus