Limits...
The maternal and early embryonic transcriptome of the milkweed bug Oncopeltus fasciatus.

Ewen-Campen B, Shaner N, Panfilio KA, Suzuki Y, Roth S, Extavour CG - BMC Genomics (2011)

Bottom Line: We identified 10,775 unique genes, including members of all major conserved metazoan signaling pathways and genes involved in several major categories of early developmental processes.We also specifically address the effects of cDNA normalization on gene discovery in de novo transcriptome analyses.Our sequencing, assembly and annotation framework provide a simple and effective way to achieve high-throughput gene discovery for organisms lacking a sequenced genome.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA.

ABSTRACT

Background: Most evolutionary developmental biology ("evo-devo") studies of emerging model organisms focus on small numbers of candidate genes cloned individually using degenerate PCR. However, newly available sequencing technologies such as 454 pyrosequencing have recently begun to allow for massive gene discovery in animals without sequenced genomes. Within insects, although large volumes of sequence data are available for holometabolous insects, developmental studies of basally branching hemimetabolous insects typically suffer from low rates of gene discovery.

Results: We used 454 pyrosequencing to sequence over 500 million bases of cDNA from the ovaries and embryos of the milkweed bug Oncopeltus fasciatus, which lacks a sequenced genome. This indirectly developing insect occupies an important phylogenetic position, branching basal to Diptera (including fruit flies) and Hymenoptera (including honeybees), and is an experimentally tractable model for short-germ development. 2,087,410 reads from both normalized and non-normalized cDNA assembled into 21,097 sequences (isotigs) and 112,531 singletons. The assembled sequences fell into 16,617 unique gene models, and included predictions of splicing isoforms, which we examined experimentally. Discovery of new genes plateaued after assembly of ~1.5 million reads, suggesting that we have sequenced nearly all transcripts present in the cDNA sampled. Many transcripts have been assembled at close to full length, and there is a net gain of sequence data for over half of the pre-existing O. fasciatus accessions for developmental genes in GenBank. We identified 10,775 unique genes, including members of all major conserved metazoan signaling pathways and genes involved in several major categories of early developmental processes. We also specifically address the effects of cDNA normalization on gene discovery in de novo transcriptome analyses.

Conclusions: Our sequencing, assembly and annotation framework provide a simple and effective way to achieve high-throughput gene discovery for organisms lacking a sequenced genome. These data will have applications to the study of the evolution of arthropod genes and genetic pathways, and to the wider evolution, development and genomics communities working with emerging model organisms.[The sequence data from this study have been submitted to GenBank under study accession number SRP002610 (http://www.ncbi.nlm.nih.gov/sra?term=SRP002610). Custom scripts generated are available at http://www.extavourlab.com/protocols/index.html. Seven Additional files are available.].

Show MeSH

Related in: MedlinePlus

Effects of normalization and 454 sequencing chemistry on read length and isotig length. (A) Titanium sequencing chemistry (grey, black) generally results in longer read lengths when compared with FLX chemistry (white). However, the normalized sample run with Titanium chemistry (black) had shorter read lengths than the non-normalized sample (grey). This result is likely due to a technical error in that particular sequencing run, since a 1/8 plate run of the same sample showed a read length distribution comparable to that of the non-normalized sample (Additional file 1). (B) Isotig length distributions from assemblies of Titanium-sequenced data. The longest isotig per isogroup is shown. The number of bases in the non-normalized (grey) and normalized (black) samples has been equalized to eliminate possible bias due to the greater number and length of reads obtained from the run of the normalized sample (see (A)). The isotigs generated from the normalized cDNA tended to be shorter than those produced by the non-normalized cDNA (see also Table 2). Pooling all FLX and Titanium reads generates an assembly with more, longer isotigs (blue).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3040728&req=5

Figure 2: Effects of normalization and 454 sequencing chemistry on read length and isotig length. (A) Titanium sequencing chemistry (grey, black) generally results in longer read lengths when compared with FLX chemistry (white). However, the normalized sample run with Titanium chemistry (black) had shorter read lengths than the non-normalized sample (grey). This result is likely due to a technical error in that particular sequencing run, since a 1/8 plate run of the same sample showed a read length distribution comparable to that of the non-normalized sample (Additional file 1). (B) Isotig length distributions from assemblies of Titanium-sequenced data. The longest isotig per isogroup is shown. The number of bases in the non-normalized (grey) and normalized (black) samples has been equalized to eliminate possible bias due to the greater number and length of reads obtained from the run of the normalized sample (see (A)). The isotigs generated from the normalized cDNA tended to be shorter than those produced by the non-normalized cDNA (see also Table 2). Pooling all FLX and Titanium reads generates an assembly with more, longer isotigs (blue).

Mentions: We prepared cDNA from ovaries and early to mid-staged embryos of O. fasciatus, covering oogenesis and all major stages of embryonic patterning (Figure 1B-D). These cDNA samples were prepared using a protocol optimized for preparation of small or limiting samples for 454 pyrosequencing (see Materials and Methods). From these libraries, we generated a total of 2,087,410 sequence reads (Table 1). This includes reads generated using GS-FLX technology as well as both normalized (N) and non-normalized (NN) cDNA sequenced using the GS-FLX Titanium platform. As expected, the reads generated using GS-FLX Titanium technology were substantially longer than those generated using GS-FLX technology (Table 1, Figure 2A). However, the N sample gave an unexpectedly low number of reads, which were on average shorter than those generated by the NN sample (Table 1; Figure 2A). Given that a pilot run of one lane (1/8 plate) of this same normalized cDNA sample generated roughly equal number and size-distribution as a NN pilot study (Additional file 1), we suspect that a technical error reduced the sequencing efficiency of this plate. Despite the comparatively low yield of this normalized cDNA, it still generated more than 600,000 high quality reads that we therefore included in subsequent analyses.


The maternal and early embryonic transcriptome of the milkweed bug Oncopeltus fasciatus.

Ewen-Campen B, Shaner N, Panfilio KA, Suzuki Y, Roth S, Extavour CG - BMC Genomics (2011)

Effects of normalization and 454 sequencing chemistry on read length and isotig length. (A) Titanium sequencing chemistry (grey, black) generally results in longer read lengths when compared with FLX chemistry (white). However, the normalized sample run with Titanium chemistry (black) had shorter read lengths than the non-normalized sample (grey). This result is likely due to a technical error in that particular sequencing run, since a 1/8 plate run of the same sample showed a read length distribution comparable to that of the non-normalized sample (Additional file 1). (B) Isotig length distributions from assemblies of Titanium-sequenced data. The longest isotig per isogroup is shown. The number of bases in the non-normalized (grey) and normalized (black) samples has been equalized to eliminate possible bias due to the greater number and length of reads obtained from the run of the normalized sample (see (A)). The isotigs generated from the normalized cDNA tended to be shorter than those produced by the non-normalized cDNA (see also Table 2). Pooling all FLX and Titanium reads generates an assembly with more, longer isotigs (blue).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3040728&req=5

Figure 2: Effects of normalization and 454 sequencing chemistry on read length and isotig length. (A) Titanium sequencing chemistry (grey, black) generally results in longer read lengths when compared with FLX chemistry (white). However, the normalized sample run with Titanium chemistry (black) had shorter read lengths than the non-normalized sample (grey). This result is likely due to a technical error in that particular sequencing run, since a 1/8 plate run of the same sample showed a read length distribution comparable to that of the non-normalized sample (Additional file 1). (B) Isotig length distributions from assemblies of Titanium-sequenced data. The longest isotig per isogroup is shown. The number of bases in the non-normalized (grey) and normalized (black) samples has been equalized to eliminate possible bias due to the greater number and length of reads obtained from the run of the normalized sample (see (A)). The isotigs generated from the normalized cDNA tended to be shorter than those produced by the non-normalized cDNA (see also Table 2). Pooling all FLX and Titanium reads generates an assembly with more, longer isotigs (blue).
Mentions: We prepared cDNA from ovaries and early to mid-staged embryos of O. fasciatus, covering oogenesis and all major stages of embryonic patterning (Figure 1B-D). These cDNA samples were prepared using a protocol optimized for preparation of small or limiting samples for 454 pyrosequencing (see Materials and Methods). From these libraries, we generated a total of 2,087,410 sequence reads (Table 1). This includes reads generated using GS-FLX technology as well as both normalized (N) and non-normalized (NN) cDNA sequenced using the GS-FLX Titanium platform. As expected, the reads generated using GS-FLX Titanium technology were substantially longer than those generated using GS-FLX technology (Table 1, Figure 2A). However, the N sample gave an unexpectedly low number of reads, which were on average shorter than those generated by the NN sample (Table 1; Figure 2A). Given that a pilot run of one lane (1/8 plate) of this same normalized cDNA sample generated roughly equal number and size-distribution as a NN pilot study (Additional file 1), we suspect that a technical error reduced the sequencing efficiency of this plate. Despite the comparatively low yield of this normalized cDNA, it still generated more than 600,000 high quality reads that we therefore included in subsequent analyses.

Bottom Line: We identified 10,775 unique genes, including members of all major conserved metazoan signaling pathways and genes involved in several major categories of early developmental processes.We also specifically address the effects of cDNA normalization on gene discovery in de novo transcriptome analyses.Our sequencing, assembly and annotation framework provide a simple and effective way to achieve high-throughput gene discovery for organisms lacking a sequenced genome.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA.

ABSTRACT

Background: Most evolutionary developmental biology ("evo-devo") studies of emerging model organisms focus on small numbers of candidate genes cloned individually using degenerate PCR. However, newly available sequencing technologies such as 454 pyrosequencing have recently begun to allow for massive gene discovery in animals without sequenced genomes. Within insects, although large volumes of sequence data are available for holometabolous insects, developmental studies of basally branching hemimetabolous insects typically suffer from low rates of gene discovery.

Results: We used 454 pyrosequencing to sequence over 500 million bases of cDNA from the ovaries and embryos of the milkweed bug Oncopeltus fasciatus, which lacks a sequenced genome. This indirectly developing insect occupies an important phylogenetic position, branching basal to Diptera (including fruit flies) and Hymenoptera (including honeybees), and is an experimentally tractable model for short-germ development. 2,087,410 reads from both normalized and non-normalized cDNA assembled into 21,097 sequences (isotigs) and 112,531 singletons. The assembled sequences fell into 16,617 unique gene models, and included predictions of splicing isoforms, which we examined experimentally. Discovery of new genes plateaued after assembly of ~1.5 million reads, suggesting that we have sequenced nearly all transcripts present in the cDNA sampled. Many transcripts have been assembled at close to full length, and there is a net gain of sequence data for over half of the pre-existing O. fasciatus accessions for developmental genes in GenBank. We identified 10,775 unique genes, including members of all major conserved metazoan signaling pathways and genes involved in several major categories of early developmental processes. We also specifically address the effects of cDNA normalization on gene discovery in de novo transcriptome analyses.

Conclusions: Our sequencing, assembly and annotation framework provide a simple and effective way to achieve high-throughput gene discovery for organisms lacking a sequenced genome. These data will have applications to the study of the evolution of arthropod genes and genetic pathways, and to the wider evolution, development and genomics communities working with emerging model organisms.[The sequence data from this study have been submitted to GenBank under study accession number SRP002610 (http://www.ncbi.nlm.nih.gov/sra?term=SRP002610). Custom scripts generated are available at http://www.extavourlab.com/protocols/index.html. Seven Additional files are available.].

Show MeSH
Related in: MedlinePlus