Limits...
The venom-gland transcriptome of the eastern diamondback rattlesnake (Crotalus adamanteus).

Rokyta DR, Lemmon AR, Margres MJ, Aronow K - BMC Genomics (2012)

Bottom Line: The most diverse toxin classes were the C-type lectins (21 clusters), the snake-venom metalloproteinases (16 clusters), and the serine proteinases (14 clusters).The high-abundance nontoxin transcripts were predominantly those involved in protein folding and translation, consistent with the protein-secretory function of the tissue.We have more than doubled the number of sequenced toxins for this species and created extensive genomic resources for snakes based entirely on de novo assembly of Illumina sequence data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Science, Florida State University, Tallahassee, FL 32306-4295, USA. drokyta@bio.fsu.edu

ABSTRACT

Background: Snake venoms have significant impacts on human populations through the morbidity and mortality associated with snakebites and as sources of drugs, drug leads, and physiological research tools. Genes expressed by venom-gland tissue, including those encoding toxic proteins, have therefore been sequenced but only with relatively sparse coverage resulting from the low-throughput sequencing approaches available. High-throughput approaches based on 454 pyrosequencing have recently been applied to the study of snake venoms to give the most complete characterizations to date of the genes expressed in active venom glands, but such approaches are costly and still provide a far-from-complete characterization of the genes expressed during venom production.

Results: We describe the de novo assembly and analysis of the venom-gland transcriptome of an eastern diamondback rattlesnake (Crotalus adamanteus) based on 95,643,958 pairs of quality-filtered, 100-base-pair Illumina reads. We identified 123 unique, full-length toxin-coding sequences, which cluster into 78 groups with less than 1% nucleotide divergence, and 2,879 unique, full-length nontoxin coding sequences. The toxin sequences accounted for 35.4% of the total reads, and the nontoxin sequences for an additional 27.5%. The most highly expressed toxin was a small myotoxin related to crotamine, which accounted for 5.9% of the total reads. Snake-venom metalloproteinases accounted for the highest percentage of reads mapping to a toxin class (24.4%), followed by C-type lectins (22.2%) and serine proteinases (20.0%). The most diverse toxin classes were the C-type lectins (21 clusters), the snake-venom metalloproteinases (16 clusters), and the serine proteinases (14 clusters). The high-abundance nontoxin transcripts were predominantly those involved in protein folding and translation, consistent with the protein-secretory function of the tissue.

Conclusions: We have provided the most complete characterization of the genes expressed in an active snake venom gland to date, producing insights into snakebite pathology and guidance for snakebite treatment for the largest rattlesnake species and arguably the most dangerous snake native to the United States of America, C. adamanteus. We have more than doubled the number of sequenced toxins for this species and created extensive genomic resources for snakes based entirely on de novo assembly of Illumina sequence data.

Show MeSH

Related in: MedlinePlus

Domination of the C. adamanteus venom-gland transcriptome by toxin transcripts. The 123 unique toxin sequences were clustered into 78 groups with less than 1% nucleotide divergence for estimation of abundances. (A) The vast majority of the extremely highly expressed genes were toxins. The inset shows a magnification of the top 200 transcripts. (B) Expression levels of individual toxin clusters are shown with toxin classes coded by color. The toxin clusters are in the same order as in Table 3.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3472243&req=5

Figure 2: Domination of the C. adamanteus venom-gland transcriptome by toxin transcripts. The 123 unique toxin sequences were clustered into 78 groups with less than 1% nucleotide divergence for estimation of abundances. (A) The vast majority of the extremely highly expressed genes were toxins. The inset shows a magnification of the top 200 transcripts. (B) Expression levels of individual toxin clusters are shown with toxin classes coded by color. The toxin clusters are in the same order as in Table 3.

Mentions: Our second approach to transcriptome assembly was designed to annotate as many full-length coding sequences (toxin and nontoxin) as possible and to build a reference database of sequences to facilitate the future analysis of other snake venom-gland transcriptomes. We found that NGen was much more successful at producing transcripts with full-length coding sequences but also that it was quite inefficient when the coverage distribution was extremely uneven (see Figure 2). Feldmeyer et al. [41] also found NGen to have the best assembly performance with Illumina data. We sought therefore first to eliminate the transcripts and corresponding reads for the extremely high-abundance sequences. To do so, we employed Extender as a de novo assembler by starting from 1,000 individual high-quality reads and attempting to complete their transcripts (see Methods). From 1,000 seeds, we identified 318 full-length coding sequences with 213 toxins and 105 nontoxins. After duplicates were eliminated, this procedure resulted in 58 unique toxin and 44 unique nontoxin full-length transcripts. These sequences were used to filter the corresponding reads from the full set of merged reads with NGen. We then performed a de novo transcriptome assembly on 10 million of the filtered reads with NGen, annotated full-length transcripts from contigs comprising ≥ 200 reads with significant blastx hits, and used the resulting unique sequences as a new filter. This process of assembly, annotation, and filtering was iterated two more times. The end result was 91 unique toxin and 2,851 unique nontoxin sequences.


The venom-gland transcriptome of the eastern diamondback rattlesnake (Crotalus adamanteus).

Rokyta DR, Lemmon AR, Margres MJ, Aronow K - BMC Genomics (2012)

Domination of the C. adamanteus venom-gland transcriptome by toxin transcripts. The 123 unique toxin sequences were clustered into 78 groups with less than 1% nucleotide divergence for estimation of abundances. (A) The vast majority of the extremely highly expressed genes were toxins. The inset shows a magnification of the top 200 transcripts. (B) Expression levels of individual toxin clusters are shown with toxin classes coded by color. The toxin clusters are in the same order as in Table 3.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3472243&req=5

Figure 2: Domination of the C. adamanteus venom-gland transcriptome by toxin transcripts. The 123 unique toxin sequences were clustered into 78 groups with less than 1% nucleotide divergence for estimation of abundances. (A) The vast majority of the extremely highly expressed genes were toxins. The inset shows a magnification of the top 200 transcripts. (B) Expression levels of individual toxin clusters are shown with toxin classes coded by color. The toxin clusters are in the same order as in Table 3.
Mentions: Our second approach to transcriptome assembly was designed to annotate as many full-length coding sequences (toxin and nontoxin) as possible and to build a reference database of sequences to facilitate the future analysis of other snake venom-gland transcriptomes. We found that NGen was much more successful at producing transcripts with full-length coding sequences but also that it was quite inefficient when the coverage distribution was extremely uneven (see Figure 2). Feldmeyer et al. [41] also found NGen to have the best assembly performance with Illumina data. We sought therefore first to eliminate the transcripts and corresponding reads for the extremely high-abundance sequences. To do so, we employed Extender as a de novo assembler by starting from 1,000 individual high-quality reads and attempting to complete their transcripts (see Methods). From 1,000 seeds, we identified 318 full-length coding sequences with 213 toxins and 105 nontoxins. After duplicates were eliminated, this procedure resulted in 58 unique toxin and 44 unique nontoxin full-length transcripts. These sequences were used to filter the corresponding reads from the full set of merged reads with NGen. We then performed a de novo transcriptome assembly on 10 million of the filtered reads with NGen, annotated full-length transcripts from contigs comprising ≥ 200 reads with significant blastx hits, and used the resulting unique sequences as a new filter. This process of assembly, annotation, and filtering was iterated two more times. The end result was 91 unique toxin and 2,851 unique nontoxin sequences.

Bottom Line: The most diverse toxin classes were the C-type lectins (21 clusters), the snake-venom metalloproteinases (16 clusters), and the serine proteinases (14 clusters).The high-abundance nontoxin transcripts were predominantly those involved in protein folding and translation, consistent with the protein-secretory function of the tissue.We have more than doubled the number of sequenced toxins for this species and created extensive genomic resources for snakes based entirely on de novo assembly of Illumina sequence data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Science, Florida State University, Tallahassee, FL 32306-4295, USA. drokyta@bio.fsu.edu

ABSTRACT

Background: Snake venoms have significant impacts on human populations through the morbidity and mortality associated with snakebites and as sources of drugs, drug leads, and physiological research tools. Genes expressed by venom-gland tissue, including those encoding toxic proteins, have therefore been sequenced but only with relatively sparse coverage resulting from the low-throughput sequencing approaches available. High-throughput approaches based on 454 pyrosequencing have recently been applied to the study of snake venoms to give the most complete characterizations to date of the genes expressed in active venom glands, but such approaches are costly and still provide a far-from-complete characterization of the genes expressed during venom production.

Results: We describe the de novo assembly and analysis of the venom-gland transcriptome of an eastern diamondback rattlesnake (Crotalus adamanteus) based on 95,643,958 pairs of quality-filtered, 100-base-pair Illumina reads. We identified 123 unique, full-length toxin-coding sequences, which cluster into 78 groups with less than 1% nucleotide divergence, and 2,879 unique, full-length nontoxin coding sequences. The toxin sequences accounted for 35.4% of the total reads, and the nontoxin sequences for an additional 27.5%. The most highly expressed toxin was a small myotoxin related to crotamine, which accounted for 5.9% of the total reads. Snake-venom metalloproteinases accounted for the highest percentage of reads mapping to a toxin class (24.4%), followed by C-type lectins (22.2%) and serine proteinases (20.0%). The most diverse toxin classes were the C-type lectins (21 clusters), the snake-venom metalloproteinases (16 clusters), and the serine proteinases (14 clusters). The high-abundance nontoxin transcripts were predominantly those involved in protein folding and translation, consistent with the protein-secretory function of the tissue.

Conclusions: We have provided the most complete characterization of the genes expressed in an active snake venom gland to date, producing insights into snakebite pathology and guidance for snakebite treatment for the largest rattlesnake species and arguably the most dangerous snake native to the United States of America, C. adamanteus. We have more than doubled the number of sequenced toxins for this species and created extensive genomic resources for snakes based entirely on de novo assembly of Illumina sequence data.

Show MeSH
Related in: MedlinePlus