Limits...
Insights into a dinoflagellate genome through expressed sequence tag analysis.

Hackett JD, Scheetz TE, Yoon HS, Soares MB, Bonaldo MF, Casavant TL, Bhattacharya D - BMC Genomics (2005)

Bottom Line: We also identified rare transcripts encoding a predicted protein highly similar to histone H2A.X.We speculate this histone may be retained for its role in DNA double-strand break repair.These data will be instrumental to future research to understand the unique and complex cell biology of these organisms and for potentially identifying the genes involved in toxin production.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Sciences and Roy J, Carver Center for Comparative Genomics, University of Iowa, Iowa City, IA 52242, USA. jeremiah-hackett@uiowa.edu

ABSTRACT

Background: Dinoflagellates are important marine primary producers and grazers and cause toxic "red tides". These taxa are characterized by many unique features such as immense genomes, the absence of nucleosomes, and photosynthetic organelles (plastids) that have been gained and lost multiple times. We generated EST sequences from non-normalized and normalized cDNA libraries from a culture of the toxic species Alexandrium tamarense to elucidate dinoflagellate evolution. Previous analyses of these data have clarified plastid origin and here we study the gene content, annotate the ESTs, and analyze the genes that are putatively involved in DNA packaging.

Results: Approximately 20% of the 6,723 unique (11,171 total 3'-reads) ESTs data could be annotated using Blast searches against GenBank. Several putative dinoflagellate-specific mRNAs were identified, including one novel plastid protein. Dinoflagellate genes, similar to other eukaryotes, have a high GC-content that is reflected in the amino acid codon usage. Highly represented transcripts include histone-like (HLP) and luciferin binding proteins and several genes occur in families that encode nearly identical proteins. We also identified rare transcripts encoding a predicted protein highly similar to histone H2A.X. We speculate this histone may be retained for its role in DNA double-strand break repair.

Conclusion: This is the most extensive collection to date of ESTs from a toxic dinoflagellate. These data will be instrumental to future research to understand the unique and complex cell biology of these organisms and for potentially identifying the genes involved in toxin production.

Show MeSH

Related in: MedlinePlus

GO category assignment of A. tamarense ESTs. Classification of 1,203 A. tamarense ESTs into the GO categories.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1173104&req=5

Figure 2: GO category assignment of A. tamarense ESTs. Classification of 1,203 A. tamarense ESTs into the GO categories.

Mentions: Each cluster was searched against the SwissProt protein database using blastx. A total of 515 hits with an e-value less than 1e-20 were identified that terminated within 10 amino acids of the end of the SwissProt entry. From these hits, we estimated that the 3'-UTRs ranged in length from 25 – 620 nt with a mean length of 155 nt. This is shorter than the average length observed for fungi (~200 nt) and metazoans (300–600 nt) [22]. However, this analysis is likely to be an underestimate of the average 3'-UTR length because only ESTs that were sequenced into the coding region were included in the analysis. The 3'-UTRs of A. tamarense cDNAs are also interesting because of their apparent lack of a polyA signal. Both simple n-mer searches (e.g. hexamer, pentamer) and the Gibb's sampler were used to assay the canonical region from -11 to -30 preceding the polyadenylation site in search of a polyadenylation signal. We were unable to find a single or a related set of hexamers or pentamers that are enriched in the 3'-UTRs (data not shown). Clearly, polyadenylation of transcripts occurs in A. tamarense, however, the mechanism by which this process takes place apparently does not involve a typical polyA signal. These ESTs were also analyzed for GC-content and codon usage. Coding region GC-content was 60.8%, whereas GC-content in the 3'-UTR was slightly less at 57.6%. The GC-content is reflected in the codon usage (Table 2), whereby 3rd positions are strongly biased towards Gs or Cs. The stop codon TGA is also significantly favoured over TAG and TAA (frequencies of 411, 71, and 25 occurrences, respectively). The accession numbers of SwissProt hits with an e-value of 9e-10 and below (1,292 sequences) were submitted to the ProToGo server for GO category assignment [23]. A total of 1,203 of the SwissProt accession numbers could be assigned to GO categories. The results are summarized in Figure 2. The functional distribution of the A. tamarense ESTs that could be placed among GO categories is typical of other eukaryotes. However, the overall small number (i.e., 20%) of significant hits to GenBank is surprising, suggesting that many A. tamarense proteins may be either highly diverged and/or encode novel dinoflagellate-specific functions (e.g., Figure 1), or the sequence does not extend into the coding region of the transcript.


Insights into a dinoflagellate genome through expressed sequence tag analysis.

Hackett JD, Scheetz TE, Yoon HS, Soares MB, Bonaldo MF, Casavant TL, Bhattacharya D - BMC Genomics (2005)

GO category assignment of A. tamarense ESTs. Classification of 1,203 A. tamarense ESTs into the GO categories.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1173104&req=5

Figure 2: GO category assignment of A. tamarense ESTs. Classification of 1,203 A. tamarense ESTs into the GO categories.
Mentions: Each cluster was searched against the SwissProt protein database using blastx. A total of 515 hits with an e-value less than 1e-20 were identified that terminated within 10 amino acids of the end of the SwissProt entry. From these hits, we estimated that the 3'-UTRs ranged in length from 25 – 620 nt with a mean length of 155 nt. This is shorter than the average length observed for fungi (~200 nt) and metazoans (300–600 nt) [22]. However, this analysis is likely to be an underestimate of the average 3'-UTR length because only ESTs that were sequenced into the coding region were included in the analysis. The 3'-UTRs of A. tamarense cDNAs are also interesting because of their apparent lack of a polyA signal. Both simple n-mer searches (e.g. hexamer, pentamer) and the Gibb's sampler were used to assay the canonical region from -11 to -30 preceding the polyadenylation site in search of a polyadenylation signal. We were unable to find a single or a related set of hexamers or pentamers that are enriched in the 3'-UTRs (data not shown). Clearly, polyadenylation of transcripts occurs in A. tamarense, however, the mechanism by which this process takes place apparently does not involve a typical polyA signal. These ESTs were also analyzed for GC-content and codon usage. Coding region GC-content was 60.8%, whereas GC-content in the 3'-UTR was slightly less at 57.6%. The GC-content is reflected in the codon usage (Table 2), whereby 3rd positions are strongly biased towards Gs or Cs. The stop codon TGA is also significantly favoured over TAG and TAA (frequencies of 411, 71, and 25 occurrences, respectively). The accession numbers of SwissProt hits with an e-value of 9e-10 and below (1,292 sequences) were submitted to the ProToGo server for GO category assignment [23]. A total of 1,203 of the SwissProt accession numbers could be assigned to GO categories. The results are summarized in Figure 2. The functional distribution of the A. tamarense ESTs that could be placed among GO categories is typical of other eukaryotes. However, the overall small number (i.e., 20%) of significant hits to GenBank is surprising, suggesting that many A. tamarense proteins may be either highly diverged and/or encode novel dinoflagellate-specific functions (e.g., Figure 1), or the sequence does not extend into the coding region of the transcript.

Bottom Line: We also identified rare transcripts encoding a predicted protein highly similar to histone H2A.X.We speculate this histone may be retained for its role in DNA double-strand break repair.These data will be instrumental to future research to understand the unique and complex cell biology of these organisms and for potentially identifying the genes involved in toxin production.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biological Sciences and Roy J, Carver Center for Comparative Genomics, University of Iowa, Iowa City, IA 52242, USA. jeremiah-hackett@uiowa.edu

ABSTRACT

Background: Dinoflagellates are important marine primary producers and grazers and cause toxic "red tides". These taxa are characterized by many unique features such as immense genomes, the absence of nucleosomes, and photosynthetic organelles (plastids) that have been gained and lost multiple times. We generated EST sequences from non-normalized and normalized cDNA libraries from a culture of the toxic species Alexandrium tamarense to elucidate dinoflagellate evolution. Previous analyses of these data have clarified plastid origin and here we study the gene content, annotate the ESTs, and analyze the genes that are putatively involved in DNA packaging.

Results: Approximately 20% of the 6,723 unique (11,171 total 3'-reads) ESTs data could be annotated using Blast searches against GenBank. Several putative dinoflagellate-specific mRNAs were identified, including one novel plastid protein. Dinoflagellate genes, similar to other eukaryotes, have a high GC-content that is reflected in the amino acid codon usage. Highly represented transcripts include histone-like (HLP) and luciferin binding proteins and several genes occur in families that encode nearly identical proteins. We also identified rare transcripts encoding a predicted protein highly similar to histone H2A.X. We speculate this histone may be retained for its role in DNA double-strand break repair.

Conclusion: This is the most extensive collection to date of ESTs from a toxic dinoflagellate. These data will be instrumental to future research to understand the unique and complex cell biology of these organisms and for potentially identifying the genes involved in toxin production.

Show MeSH
Related in: MedlinePlus