Limits...
Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

Kanda K, Pflug JM, Sproul JS, Dasenko MA, Maddison DR - PLoS ONE (2015)

Bottom Line: Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens.Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments.A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples.

View Article: PubMed Central - PubMed

Affiliation: Department of Integrative Biology, Oregon State University, Corvallis, Oregon, United States of America.

ABSTRACT
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.

Show MeSH

Related in: MedlinePlus

A maximum likelihood tree of carabids from seven focal genes and “Three Separate” assembly sequences.The placement of the DeNovo, NearRef, and FarRef sequences is shown relative to their prediction groups in a concatenated analysis of seven focal genes. Each prediction group is marked by a black arrow, and a unique color for branches and taxon names of all specimens in the prediction group. The placement of the three assembly sequences is indicated with a black star.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4696846&req=5

pone.0143929.g013: A maximum likelihood tree of carabids from seven focal genes and “Three Separate” assembly sequences.The placement of the DeNovo, NearRef, and FarRef sequences is shown relative to their prediction groups in a concatenated analysis of seven focal genes. Each prediction group is marked by a black arrow, and a unique color for branches and taxon names of all specimens in the prediction group. The placement of the three assembly sequences is indicated with a black star.

Mentions: In the phylogenetic analysis of the seven-gene concatenated matrix, the concatenated, multi-gene DeNovo, NearRef, and FarRef sequences for all museum specimens were inferred in positions (Fig 13; S6 and S7 Figs) consistent with our predictions (Table 8). In addition to being inferred with their predicted group, DeNovo, NearRef, and FarRef sequences formed a clade for nine of 12 museum specimens (see also Table 13); seven of these clades were strongly supported, with bootstrap support over 90%. For the four specimens in which DeNovo, NearRef, and FarRef sequences did not form a clade, an interfering sequence from a conspecific specimen or very closely related species disrupted the clade. For Bembidarenas 3983, the FarRef sequence fell outside of a moderately supported clade (bootstrap support = 82%) that included the DeNovo and NearRef sequences, as well as the PCR-based sequences from two other Bembidarenas specimens in the matrix (Bembidarenas reicheellum #1 and #2). For B. “Inuvik” 3285 and B. “Inuvik” 3984, the NearRef sequence fell outside of a poorly supported clade (bootstrap support = 56%) that included the DeNovo and FarRef sequences of B. “Inuvik” 3285 and B. “Inuvik” 3984, as well as the PCR-based sequence from B. “Inuvik” 3984. For B. orion 3079, the DeNovo sequence fell outside of a poorly supported clade (bootstrap support = 53%) that included the NearRef and FarRef sequences of B. orion 3079 and B. orion 2831, as well as the PCR-based sequence from B. orion 3079.


Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

Kanda K, Pflug JM, Sproul JS, Dasenko MA, Maddison DR - PLoS ONE (2015)

A maximum likelihood tree of carabids from seven focal genes and “Three Separate” assembly sequences.The placement of the DeNovo, NearRef, and FarRef sequences is shown relative to their prediction groups in a concatenated analysis of seven focal genes. Each prediction group is marked by a black arrow, and a unique color for branches and taxon names of all specimens in the prediction group. The placement of the three assembly sequences is indicated with a black star.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4696846&req=5

pone.0143929.g013: A maximum likelihood tree of carabids from seven focal genes and “Three Separate” assembly sequences.The placement of the DeNovo, NearRef, and FarRef sequences is shown relative to their prediction groups in a concatenated analysis of seven focal genes. Each prediction group is marked by a black arrow, and a unique color for branches and taxon names of all specimens in the prediction group. The placement of the three assembly sequences is indicated with a black star.
Mentions: In the phylogenetic analysis of the seven-gene concatenated matrix, the concatenated, multi-gene DeNovo, NearRef, and FarRef sequences for all museum specimens were inferred in positions (Fig 13; S6 and S7 Figs) consistent with our predictions (Table 8). In addition to being inferred with their predicted group, DeNovo, NearRef, and FarRef sequences formed a clade for nine of 12 museum specimens (see also Table 13); seven of these clades were strongly supported, with bootstrap support over 90%. For the four specimens in which DeNovo, NearRef, and FarRef sequences did not form a clade, an interfering sequence from a conspecific specimen or very closely related species disrupted the clade. For Bembidarenas 3983, the FarRef sequence fell outside of a moderately supported clade (bootstrap support = 82%) that included the DeNovo and NearRef sequences, as well as the PCR-based sequences from two other Bembidarenas specimens in the matrix (Bembidarenas reicheellum #1 and #2). For B. “Inuvik” 3285 and B. “Inuvik” 3984, the NearRef sequence fell outside of a poorly supported clade (bootstrap support = 56%) that included the DeNovo and FarRef sequences of B. “Inuvik” 3285 and B. “Inuvik” 3984, as well as the PCR-based sequence from B. “Inuvik” 3984. For B. orion 3079, the DeNovo sequence fell outside of a poorly supported clade (bootstrap support = 53%) that included the NearRef and FarRef sequences of B. orion 3079 and B. orion 2831, as well as the PCR-based sequence from B. orion 3079.

Bottom Line: Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens.Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments.A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples.

View Article: PubMed Central - PubMed

Affiliation: Department of Integrative Biology, Oregon State University, Corvallis, Oregon, United States of America.

ABSTRACT
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.

Show MeSH
Related in: MedlinePlus