Limits...
The venom-gland transcriptome of the eastern diamondback rattlesnake (Crotalus adamanteus).

Rokyta DR, Lemmon AR, Margres MJ, Aronow K - BMC Genomics (2012)

Bottom Line: The most diverse toxin classes were the C-type lectins (21 clusters), the snake-venom metalloproteinases (16 clusters), and the serine proteinases (14 clusters).The high-abundance nontoxin transcripts were predominantly those involved in protein folding and translation, consistent with the protein-secretory function of the tissue.We have more than doubled the number of sequenced toxins for this species and created extensive genomic resources for snakes based entirely on de novo assembly of Illumina sequence data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Science, Florida State University, Tallahassee, FL 32306-4295, USA. drokyta@bio.fsu.edu

ABSTRACT

Background: Snake venoms have significant impacts on human populations through the morbidity and mortality associated with snakebites and as sources of drugs, drug leads, and physiological research tools. Genes expressed by venom-gland tissue, including those encoding toxic proteins, have therefore been sequenced but only with relatively sparse coverage resulting from the low-throughput sequencing approaches available. High-throughput approaches based on 454 pyrosequencing have recently been applied to the study of snake venoms to give the most complete characterizations to date of the genes expressed in active venom glands, but such approaches are costly and still provide a far-from-complete characterization of the genes expressed during venom production.

Results: We describe the de novo assembly and analysis of the venom-gland transcriptome of an eastern diamondback rattlesnake (Crotalus adamanteus) based on 95,643,958 pairs of quality-filtered, 100-base-pair Illumina reads. We identified 123 unique, full-length toxin-coding sequences, which cluster into 78 groups with less than 1% nucleotide divergence, and 2,879 unique, full-length nontoxin coding sequences. The toxin sequences accounted for 35.4% of the total reads, and the nontoxin sequences for an additional 27.5%. The most highly expressed toxin was a small myotoxin related to crotamine, which accounted for 5.9% of the total reads. Snake-venom metalloproteinases accounted for the highest percentage of reads mapping to a toxin class (24.4%), followed by C-type lectins (22.2%) and serine proteinases (20.0%). The most diverse toxin classes were the C-type lectins (21 clusters), the snake-venom metalloproteinases (16 clusters), and the serine proteinases (14 clusters). The high-abundance nontoxin transcripts were predominantly those involved in protein folding and translation, consistent with the protein-secretory function of the tissue.

Conclusions: We have provided the most complete characterization of the genes expressed in an active snake venom gland to date, producing insights into snakebite pathology and guidance for snakebite treatment for the largest rattlesnake species and arguably the most dangerous snake native to the United States of America, C. adamanteus. We have more than doubled the number of sequenced toxins for this species and created extensive genomic resources for snakes based entirely on de novo assembly of Illumina sequence data.

Show MeSH

Related in: MedlinePlus

Merging overlapping reads. (A) Reads are slid along each other until the number of matches exceeds the significance threshold. In the example shown, the optimal overlap is 74 nucleotides (nt). (B) The quality of reads declines dramatically toward their 3’ ends, where overlap occurs if the fragment length is less than twice the read length, allowing the actual quality to be much higher than the nominal values. The example shown is the average of pairs that overlap by exactly 50 nt.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3472243&req=5

Figure 1: Merging overlapping reads. (A) Reads are slid along each other until the number of matches exceeds the significance threshold. In the example shown, the optimal overlap is 74 nucleotides (nt). (B) The quality of reads declines dramatically toward their 3’ ends, where overlap occurs if the fragment length is less than twice the read length, allowing the actual quality to be much higher than the nominal values. The example shown is the average of pairs that overlap by exactly 50 nt.

Mentions: We generated a total of 95,643,958 pairs of reads that passed the Illumina quality filter for > 19 gigabases (Gb) of sequence from a cDNA library with an average insert size of ∼170 nt. Of these reads, 72,114,709 (75%) were merged (see Methods) on the basis of their 3’ overlap (Figure 1), yielding composite reads of average length 142 nt with average phred qualities > 40 and a total length > 10 Gb. This merging of reads reduced the effective size of the data set without loss of information and provided long reads to facilitate accurate assembly.


The venom-gland transcriptome of the eastern diamondback rattlesnake (Crotalus adamanteus).

Rokyta DR, Lemmon AR, Margres MJ, Aronow K - BMC Genomics (2012)

Merging overlapping reads. (A) Reads are slid along each other until the number of matches exceeds the significance threshold. In the example shown, the optimal overlap is 74 nucleotides (nt). (B) The quality of reads declines dramatically toward their 3’ ends, where overlap occurs if the fragment length is less than twice the read length, allowing the actual quality to be much higher than the nominal values. The example shown is the average of pairs that overlap by exactly 50 nt.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3472243&req=5

Figure 1: Merging overlapping reads. (A) Reads are slid along each other until the number of matches exceeds the significance threshold. In the example shown, the optimal overlap is 74 nucleotides (nt). (B) The quality of reads declines dramatically toward their 3’ ends, where overlap occurs if the fragment length is less than twice the read length, allowing the actual quality to be much higher than the nominal values. The example shown is the average of pairs that overlap by exactly 50 nt.
Mentions: We generated a total of 95,643,958 pairs of reads that passed the Illumina quality filter for > 19 gigabases (Gb) of sequence from a cDNA library with an average insert size of ∼170 nt. Of these reads, 72,114,709 (75%) were merged (see Methods) on the basis of their 3’ overlap (Figure 1), yielding composite reads of average length 142 nt with average phred qualities > 40 and a total length > 10 Gb. This merging of reads reduced the effective size of the data set without loss of information and provided long reads to facilitate accurate assembly.

Bottom Line: The most diverse toxin classes were the C-type lectins (21 clusters), the snake-venom metalloproteinases (16 clusters), and the serine proteinases (14 clusters).The high-abundance nontoxin transcripts were predominantly those involved in protein folding and translation, consistent with the protein-secretory function of the tissue.We have more than doubled the number of sequenced toxins for this species and created extensive genomic resources for snakes based entirely on de novo assembly of Illumina sequence data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Science, Florida State University, Tallahassee, FL 32306-4295, USA. drokyta@bio.fsu.edu

ABSTRACT

Background: Snake venoms have significant impacts on human populations through the morbidity and mortality associated with snakebites and as sources of drugs, drug leads, and physiological research tools. Genes expressed by venom-gland tissue, including those encoding toxic proteins, have therefore been sequenced but only with relatively sparse coverage resulting from the low-throughput sequencing approaches available. High-throughput approaches based on 454 pyrosequencing have recently been applied to the study of snake venoms to give the most complete characterizations to date of the genes expressed in active venom glands, but such approaches are costly and still provide a far-from-complete characterization of the genes expressed during venom production.

Results: We describe the de novo assembly and analysis of the venom-gland transcriptome of an eastern diamondback rattlesnake (Crotalus adamanteus) based on 95,643,958 pairs of quality-filtered, 100-base-pair Illumina reads. We identified 123 unique, full-length toxin-coding sequences, which cluster into 78 groups with less than 1% nucleotide divergence, and 2,879 unique, full-length nontoxin coding sequences. The toxin sequences accounted for 35.4% of the total reads, and the nontoxin sequences for an additional 27.5%. The most highly expressed toxin was a small myotoxin related to crotamine, which accounted for 5.9% of the total reads. Snake-venom metalloproteinases accounted for the highest percentage of reads mapping to a toxin class (24.4%), followed by C-type lectins (22.2%) and serine proteinases (20.0%). The most diverse toxin classes were the C-type lectins (21 clusters), the snake-venom metalloproteinases (16 clusters), and the serine proteinases (14 clusters). The high-abundance nontoxin transcripts were predominantly those involved in protein folding and translation, consistent with the protein-secretory function of the tissue.

Conclusions: We have provided the most complete characterization of the genes expressed in an active snake venom gland to date, producing insights into snakebite pathology and guidance for snakebite treatment for the largest rattlesnake species and arguably the most dangerous snake native to the United States of America, C. adamanteus. We have more than doubled the number of sequenced toxins for this species and created extensive genomic resources for snakes based entirely on de novo assembly of Illumina sequence data.

Show MeSH
Related in: MedlinePlus