Limits...
Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

Kanda K, Pflug JM, Sproul JS, Dasenko MA, Maddison DR - PLoS ONE (2015)

Bottom Line: Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens.Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments.A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples.

View Article: PubMed Central - PubMed

Affiliation: Department of Integrative Biology, Oregon State University, Corvallis, Oregon, United States of America.

ABSTRACT
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.

Show MeSH

Related in: MedlinePlus

Squared correlation coefficients from univariate linear regression analyses between success measures and potential explanatory variables.Measures of success of acquiring protein-coding gene fragments are NPDN50 (de novo assembly, percent of gene fragments for which at least 50% of the bases were recovered), NPDN80 (same, but at least 80% of the bases), NPRef50 (reference-based assembly, percent of gene fragments for which at least 50% of the bases were recovered), and NPRef80 (same, but at least 80% of the bases). On the left are analysis with all samples included; on the right are analyses with only samples with more than 60 million reads included. Symbols outlined in red indicate that the correlation is significant in a single-variable analysis; symbols outlined in pale pink indicate that the correlation is significant as a secondary variable in a bivariate analysis. Note x-axis orientations are mirrored in the two graphs.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4696846&req=5

pone.0143929.g016: Squared correlation coefficients from univariate linear regression analyses between success measures and potential explanatory variables.Measures of success of acquiring protein-coding gene fragments are NPDN50 (de novo assembly, percent of gene fragments for which at least 50% of the bases were recovered), NPDN80 (same, but at least 80% of the bases), NPRef50 (reference-based assembly, percent of gene fragments for which at least 50% of the bases were recovered), and NPRef80 (same, but at least 80% of the bases). On the left are analysis with all samples included; on the right are analyses with only samples with more than 60 million reads included. Symbols outlined in red indicate that the correlation is significant in a single-variable analysis; symbols outlined in pale pink indicate that the correlation is significant as a secondary variable in a bivariate analysis. Note x-axis orientations are mirrored in the two graphs.

Mentions: In univariate analyses of all samples, the number of reads, PCR COI success, and killing chemical all showed significant correlation with at least one measure of success (Fig 16); in some bivariate analyses, body length was significant as a secondary explanatory variable. In the analysis restricted to samples with large numbers of reads, body length was the only significant explanatory variable in univariate analyses for three of the four measures of success, with killing chemical being an additional significant explanatory factor for NPRef50 in bivariate analyses. In particular, high success was correlated with high number of reads, success at COI PCR, being killed in high concentrations of ethanol, and small body size. Curiously, two variables one might have presumed to be relevant to success at sequencing, age of specimen and total quantity of DNA, showed the weakest correlations (Fig 16).


Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

Kanda K, Pflug JM, Sproul JS, Dasenko MA, Maddison DR - PLoS ONE (2015)

Squared correlation coefficients from univariate linear regression analyses between success measures and potential explanatory variables.Measures of success of acquiring protein-coding gene fragments are NPDN50 (de novo assembly, percent of gene fragments for which at least 50% of the bases were recovered), NPDN80 (same, but at least 80% of the bases), NPRef50 (reference-based assembly, percent of gene fragments for which at least 50% of the bases were recovered), and NPRef80 (same, but at least 80% of the bases). On the left are analysis with all samples included; on the right are analyses with only samples with more than 60 million reads included. Symbols outlined in red indicate that the correlation is significant in a single-variable analysis; symbols outlined in pale pink indicate that the correlation is significant as a secondary variable in a bivariate analysis. Note x-axis orientations are mirrored in the two graphs.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4696846&req=5

pone.0143929.g016: Squared correlation coefficients from univariate linear regression analyses between success measures and potential explanatory variables.Measures of success of acquiring protein-coding gene fragments are NPDN50 (de novo assembly, percent of gene fragments for which at least 50% of the bases were recovered), NPDN80 (same, but at least 80% of the bases), NPRef50 (reference-based assembly, percent of gene fragments for which at least 50% of the bases were recovered), and NPRef80 (same, but at least 80% of the bases). On the left are analysis with all samples included; on the right are analyses with only samples with more than 60 million reads included. Symbols outlined in red indicate that the correlation is significant in a single-variable analysis; symbols outlined in pale pink indicate that the correlation is significant as a secondary variable in a bivariate analysis. Note x-axis orientations are mirrored in the two graphs.
Mentions: In univariate analyses of all samples, the number of reads, PCR COI success, and killing chemical all showed significant correlation with at least one measure of success (Fig 16); in some bivariate analyses, body length was significant as a secondary explanatory variable. In the analysis restricted to samples with large numbers of reads, body length was the only significant explanatory variable in univariate analyses for three of the four measures of success, with killing chemical being an additional significant explanatory factor for NPRef50 in bivariate analyses. In particular, high success was correlated with high number of reads, success at COI PCR, being killed in high concentrations of ethanol, and small body size. Curiously, two variables one might have presumed to be relevant to success at sequencing, age of specimen and total quantity of DNA, showed the weakest correlations (Fig 16).

Bottom Line: Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens.Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments.A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples.

View Article: PubMed Central - PubMed

Affiliation: Department of Integrative Biology, Oregon State University, Corvallis, Oregon, United States of America.

ABSTRACT
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.

Show MeSH
Related in: MedlinePlus