Limits...
Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

Kanda K, Pflug JM, Sproul JS, Dasenko MA, Maddison DR - PLoS ONE (2015)

Bottom Line: Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens.Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments.A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples.

View Article: PubMed Central - PubMed

Affiliation: Department of Integrative Biology, Oregon State University, Corvallis, Oregon, United States of America.

ABSTRACT
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.

Show MeSH

Related in: MedlinePlus

Recovery success of 67 low-copy nuclear protein-coding gene fragments in HTS museum specimens.Darkness of cell corresponds to percentage of the length of that fragment that was recovered, with black cells corresponding to 100% recovery. Gene fragments are ordered by average recovery as measured across both de novo and reference-based assemblies. Gene abbreviations are those used in Regier et al. [25]. Specimen abbreviations: Lag: Lagriinae n. gen. KK0290, subf: Bembidion subfusum 3977, snt1: B. sp. nr. transversale 3021, Lchi: Lionepha chintimini 4002, lach: B. lachnophoroides 3022, Bdrs: Bembidarenas 3983, ori1: B. orion 2831, inu1: B. "Inuvik" 3285, lapp: B. lapponicum 3974, aric: B. "Arica" 3242, dspt: B. cf. "Desert Spotted" 3978, mus: B. musae 3239, inu2: B. "Inuvik" 3984, ori2: B. orion 3079, snt2: B. sp. nr. transversale 3205. Four specimens with less than 34 million reads have specimen abbreviation and age shown in gray. Numbers under the specimen abbreviations are years between death and extraction.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4696846&req=5

pone.0143929.g006: Recovery success of 67 low-copy nuclear protein-coding gene fragments in HTS museum specimens.Darkness of cell corresponds to percentage of the length of that fragment that was recovered, with black cells corresponding to 100% recovery. Gene fragments are ordered by average recovery as measured across both de novo and reference-based assemblies. Gene abbreviations are those used in Regier et al. [25]. Specimen abbreviations: Lag: Lagriinae n. gen. KK0290, subf: Bembidion subfusum 3977, snt1: B. sp. nr. transversale 3021, Lchi: Lionepha chintimini 4002, lach: B. lachnophoroides 3022, Bdrs: Bembidarenas 3983, ori1: B. orion 2831, inu1: B. "Inuvik" 3285, lapp: B. lapponicum 3974, aric: B. "Arica" 3242, dspt: B. cf. "Desert Spotted" 3978, mus: B. musae 3239, inu2: B. "Inuvik" 3984, ori2: B. orion 3079, snt2: B. sp. nr. transversale 3205. Four specimens with less than 34 million reads have specimen abbreviation and age shown in gray. Numbers under the specimen abbreviations are years between death and extraction.

Mentions: Recovery success of the 67 nuclear protein-coding gene fragments from Regier et al. [25] is summarized in Fig 6 and Table 12, with numerical values provided in S7 and S8 Tables. In general, reference-based assembly recovered more and longer gene fragments from the set of 67 gene fragments than de novo assembly, with an average increase in recovered bases across all gene fragments and all specimens of 14%. The four specimens with reduced reads performed worse than the remaining specimens, and failed to recover even partial fragments of most target genes in de novo assemblies, although recovery improved for those specimens in reference-based assemblies (Fig 6, S8 Table). Of the twelve carabid museum specimens, all but one showed an increase in the average recovery in the reference-based assembly relative to the de novo assembly, with increases in additional bases recovered ranging from a low of 5% in Bembidion subfusum to a high of 31% in Bembidion sp. nr. transversale and 34% in Bembidion cf. “Desert Spotted” (S10 Fig). The one exception was Bembidion “Arica”, which showed a decrease in recovery in the reference-based assembly, having 3% fewer bases recovered on average across the gene fragments. Within the museum carabids, there were no apparent patterns with respect to the age of specimens and recovery success, nor were there many gene fragments that were equally recovered across specimens (Fig 6).


Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

Kanda K, Pflug JM, Sproul JS, Dasenko MA, Maddison DR - PLoS ONE (2015)

Recovery success of 67 low-copy nuclear protein-coding gene fragments in HTS museum specimens.Darkness of cell corresponds to percentage of the length of that fragment that was recovered, with black cells corresponding to 100% recovery. Gene fragments are ordered by average recovery as measured across both de novo and reference-based assemblies. Gene abbreviations are those used in Regier et al. [25]. Specimen abbreviations: Lag: Lagriinae n. gen. KK0290, subf: Bembidion subfusum 3977, snt1: B. sp. nr. transversale 3021, Lchi: Lionepha chintimini 4002, lach: B. lachnophoroides 3022, Bdrs: Bembidarenas 3983, ori1: B. orion 2831, inu1: B. "Inuvik" 3285, lapp: B. lapponicum 3974, aric: B. "Arica" 3242, dspt: B. cf. "Desert Spotted" 3978, mus: B. musae 3239, inu2: B. "Inuvik" 3984, ori2: B. orion 3079, snt2: B. sp. nr. transversale 3205. Four specimens with less than 34 million reads have specimen abbreviation and age shown in gray. Numbers under the specimen abbreviations are years between death and extraction.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4696846&req=5

pone.0143929.g006: Recovery success of 67 low-copy nuclear protein-coding gene fragments in HTS museum specimens.Darkness of cell corresponds to percentage of the length of that fragment that was recovered, with black cells corresponding to 100% recovery. Gene fragments are ordered by average recovery as measured across both de novo and reference-based assemblies. Gene abbreviations are those used in Regier et al. [25]. Specimen abbreviations: Lag: Lagriinae n. gen. KK0290, subf: Bembidion subfusum 3977, snt1: B. sp. nr. transversale 3021, Lchi: Lionepha chintimini 4002, lach: B. lachnophoroides 3022, Bdrs: Bembidarenas 3983, ori1: B. orion 2831, inu1: B. "Inuvik" 3285, lapp: B. lapponicum 3974, aric: B. "Arica" 3242, dspt: B. cf. "Desert Spotted" 3978, mus: B. musae 3239, inu2: B. "Inuvik" 3984, ori2: B. orion 3079, snt2: B. sp. nr. transversale 3205. Four specimens with less than 34 million reads have specimen abbreviation and age shown in gray. Numbers under the specimen abbreviations are years between death and extraction.
Mentions: Recovery success of the 67 nuclear protein-coding gene fragments from Regier et al. [25] is summarized in Fig 6 and Table 12, with numerical values provided in S7 and S8 Tables. In general, reference-based assembly recovered more and longer gene fragments from the set of 67 gene fragments than de novo assembly, with an average increase in recovered bases across all gene fragments and all specimens of 14%. The four specimens with reduced reads performed worse than the remaining specimens, and failed to recover even partial fragments of most target genes in de novo assemblies, although recovery improved for those specimens in reference-based assemblies (Fig 6, S8 Table). Of the twelve carabid museum specimens, all but one showed an increase in the average recovery in the reference-based assembly relative to the de novo assembly, with increases in additional bases recovered ranging from a low of 5% in Bembidion subfusum to a high of 31% in Bembidion sp. nr. transversale and 34% in Bembidion cf. “Desert Spotted” (S10 Fig). The one exception was Bembidion “Arica”, which showed a decrease in recovery in the reference-based assembly, having 3% fewer bases recovered on average across the gene fragments. Within the museum carabids, there were no apparent patterns with respect to the age of specimens and recovery success, nor were there many gene fragments that were equally recovered across specimens (Fig 6).

Bottom Line: Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens.Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments.A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples.

View Article: PubMed Central - PubMed

Affiliation: Department of Integrative Biology, Oregon State University, Corvallis, Oregon, United States of America.

ABSTRACT
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.

Show MeSH
Related in: MedlinePlus