Limits...
A pipeline for the de novo assembly of the Themira biloba (Sepsidae: Diptera) transcriptome using a multiple k-mer length approach.

Melicher D, Torson AS, Dworkin I, Bowsher JH - BMC Genomics (2014)

Bottom Line: We assembled transcriptomes from an additional three non-model organisms to demonstrate that our pipeline assembled a higher-quality transcriptome than single k-mer approaches across multiple species.The pipeline we have developed for assembly and analysis increases contig length, recovers unique transcripts, and assembles more base pairs than other methods through the use of a meta-assembly.The T. biloba transcriptome is a critical resource for performing large-scale RNA-Seq investigations of gene expression patterns, and is the first transcriptome sequenced in this Dipteran family.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Sciences, North Dakota State University, 1340 Bolley Drive, 218 Stevens Hall, Fargo, ND 58102, USA. Dacotah.Melicher@ndsu.edu.

ABSTRACT

Background: The Sepsidae family of flies is a model for investigating how sexual selection shapes courtship and sexual dimorphism in a comparative framework. However, like many non-model systems, there are few molecular resources available. Large-scale sequencing and assembly have not been performed in any sepsid, and the lack of a closely related genome makes investigation of gene expression challenging. Our goal was to develop an automated pipeline for de novo transcriptome assembly, and to use that pipeline to assemble and analyze the transcriptome of the sepsid Themira biloba.

Results: Our bioinformatics pipeline uses cloud computing services to assemble and analyze the transcriptome with off-site data management, processing, and backup. It uses a multiple k-mer length approach combined with a second meta-assembly to extend transcripts and recover more bases of transcript sequences than standard single k-mer assembly. We used 454 sequencing to generate 1.48 million reads from cDNA generated from embryo, larva, and pupae of T. biloba and assembled a transcriptome consisting of 24,495 contigs. Annotation identified 16,705 transcripts, including those involved in embryogenesis and limb patterning. We assembled transcriptomes from an additional three non-model organisms to demonstrate that our pipeline assembled a higher-quality transcriptome than single k-mer approaches across multiple species.

Conclusions: The pipeline we have developed for assembly and analysis increases contig length, recovers unique transcripts, and assembles more base pairs than other methods through the use of a meta-assembly. The T. biloba transcriptome is a critical resource for performing large-scale RNA-Seq investigations of gene expression patterns, and is the first transcriptome sequenced in this Dipteran family.

Show MeSH
Gene Ontology classification of theT. bilobatranscriptome. Gene Ontology (GO) was assigned to all contigs from the T. biloba meta-assembly. Gene ontologies were group into three main categories and 42 sub-categories. Contigs are grouped by the percentage of sequences that match a specific GO term within three major groups. The most abundant transcripts represent the sub-categories containing structural proteins and regulators of various cellular processes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4008362&req=5

Fig6: Gene Ontology classification of theT. bilobatranscriptome. Gene Ontology (GO) was assigned to all contigs from the T. biloba meta-assembly. Gene ontologies were group into three main categories and 42 sub-categories. Contigs are grouped by the percentage of sequences that match a specific GO term within three major groups. The most abundant transcripts represent the sub-categories containing structural proteins and regulators of various cellular processes.

Mentions: To determine ontology, T. biloba transcripts were submitted for KEGG pathway analysis resulting in 5,080 contigs with identified functions. Many developmentally import pathways involved in cell signaling such as the notch pathway were near complete (Additional file 3: Table S2). Transcripts were assigned gene ontologies, which were then grouped by function (FigureĀ 6) to determine whether the transcripts recovered from the meta-assembly were representative of the main cellular processes. A broad range of functional groups were present in the assembly, indicating that transcripts representing many different kinds of proteins were recovered. The distribution of contig gene ontologies is similar to those found in the distribution of GO terms found in the Drosophila transcriptome and other de novo transcriptome assembly efforts [34, 55, 52, 54].Figure 6


A pipeline for the de novo assembly of the Themira biloba (Sepsidae: Diptera) transcriptome using a multiple k-mer length approach.

Melicher D, Torson AS, Dworkin I, Bowsher JH - BMC Genomics (2014)

Gene Ontology classification of theT. bilobatranscriptome. Gene Ontology (GO) was assigned to all contigs from the T. biloba meta-assembly. Gene ontologies were group into three main categories and 42 sub-categories. Contigs are grouped by the percentage of sequences that match a specific GO term within three major groups. The most abundant transcripts represent the sub-categories containing structural proteins and regulators of various cellular processes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4008362&req=5

Fig6: Gene Ontology classification of theT. bilobatranscriptome. Gene Ontology (GO) was assigned to all contigs from the T. biloba meta-assembly. Gene ontologies were group into three main categories and 42 sub-categories. Contigs are grouped by the percentage of sequences that match a specific GO term within three major groups. The most abundant transcripts represent the sub-categories containing structural proteins and regulators of various cellular processes.
Mentions: To determine ontology, T. biloba transcripts were submitted for KEGG pathway analysis resulting in 5,080 contigs with identified functions. Many developmentally import pathways involved in cell signaling such as the notch pathway were near complete (Additional file 3: Table S2). Transcripts were assigned gene ontologies, which were then grouped by function (FigureĀ 6) to determine whether the transcripts recovered from the meta-assembly were representative of the main cellular processes. A broad range of functional groups were present in the assembly, indicating that transcripts representing many different kinds of proteins were recovered. The distribution of contig gene ontologies is similar to those found in the distribution of GO terms found in the Drosophila transcriptome and other de novo transcriptome assembly efforts [34, 55, 52, 54].Figure 6

Bottom Line: We assembled transcriptomes from an additional three non-model organisms to demonstrate that our pipeline assembled a higher-quality transcriptome than single k-mer approaches across multiple species.The pipeline we have developed for assembly and analysis increases contig length, recovers unique transcripts, and assembles more base pairs than other methods through the use of a meta-assembly.The T. biloba transcriptome is a critical resource for performing large-scale RNA-Seq investigations of gene expression patterns, and is the first transcriptome sequenced in this Dipteran family.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Sciences, North Dakota State University, 1340 Bolley Drive, 218 Stevens Hall, Fargo, ND 58102, USA. Dacotah.Melicher@ndsu.edu.

ABSTRACT

Background: The Sepsidae family of flies is a model for investigating how sexual selection shapes courtship and sexual dimorphism in a comparative framework. However, like many non-model systems, there are few molecular resources available. Large-scale sequencing and assembly have not been performed in any sepsid, and the lack of a closely related genome makes investigation of gene expression challenging. Our goal was to develop an automated pipeline for de novo transcriptome assembly, and to use that pipeline to assemble and analyze the transcriptome of the sepsid Themira biloba.

Results: Our bioinformatics pipeline uses cloud computing services to assemble and analyze the transcriptome with off-site data management, processing, and backup. It uses a multiple k-mer length approach combined with a second meta-assembly to extend transcripts and recover more bases of transcript sequences than standard single k-mer assembly. We used 454 sequencing to generate 1.48 million reads from cDNA generated from embryo, larva, and pupae of T. biloba and assembled a transcriptome consisting of 24,495 contigs. Annotation identified 16,705 transcripts, including those involved in embryogenesis and limb patterning. We assembled transcriptomes from an additional three non-model organisms to demonstrate that our pipeline assembled a higher-quality transcriptome than single k-mer approaches across multiple species.

Conclusions: The pipeline we have developed for assembly and analysis increases contig length, recovers unique transcripts, and assembles more base pairs than other methods through the use of a meta-assembly. The T. biloba transcriptome is a critical resource for performing large-scale RNA-Seq investigations of gene expression patterns, and is the first transcriptome sequenced in this Dipteran family.

Show MeSH