Limits...
Genome-wide analysis of mRNA abundance in two life-cycle stages of Trypanosoma brucei and identification of splicing and polyadenylation sites.

Siegel TN, Hekstra DR, Wang X, Dewell S, Cross GA - Nucleic Acids Res. (2010)

Bottom Line: In some cases, alternative SAS would give rise to mRNAs encoding proteins with different N-terminal sequences.We could identify the introns in two genes known to contain them, but found no additional genes with introns.Our study demonstrates the usefulness of the RNA-seq technology to study the transcriptional landscape of an organism whose genome has not been fully annotated.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Molecular Parasitology, The Rockefeller University, New York, NY 10065, USA.

ABSTRACT
Transcription of protein-coding genes in trypanosomes is polycistronic and gene expression is primarily regulated by post-transcriptional mechanisms. Sequence motifs in the untranslated regions regulate mRNA trans-splicing and RNA stability, yet where UTRs begin and end is known for very few genes. We used high-throughput RNA-sequencing to determine the genome-wide steady-state mRNA levels ('transcriptomes') for approximately 90% of the genome in two stages of the Trypanosoma brucei life cycle cultured in vitro. Almost 6% of genes were differentially expressed between the two life-cycle stages. We identified 5' splice-acceptor sites (SAS) and polyadenylation sites (PAS) for 6959 and 5948 genes, respectively. Most genes have between one and three alternative SAS, but PAS are more dispersed. For 488 genes, SAS were identified downstream of the originally assigned initiator ATG, so a subsequent in-frame ATG presumably designates the start of the true coding sequence. In some cases, alternative SAS would give rise to mRNAs encoding proteins with different N-terminal sequences. We could identify the introns in two genes known to contain them, but found no additional genes with introns. Our study demonstrates the usefulness of the RNA-seq technology to study the transcriptional landscape of an organism whose genome has not been fully annotated.

Show MeSH

Related in: MedlinePlus

Length of 5′ and 3′ UTRs. (A) Histogram showing the length distribution of 5′ UTR (left panel; n = 6 644; window 50 nt) and 3′ UTR (right panel; n = 5911; window 100 nt) for the predominant SAS and PAS. If multiple SAS or PAS occurred at the same frequency the length of the shortest UTR was used for this histogram. (B) Quantification of multiple PAS used by β-tubulin (left panel) and α-tubulin (right panel). Nucleotides labeled in red indicate PAS. For β-tubulin, additional sequence tags indicated PAS at 318 bp (1 tag), 323 bp (3), 334 bp (1) and 476 bp (1) downstream of the ORF.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2926603&req=5

Figure 4: Length of 5′ and 3′ UTRs. (A) Histogram showing the length distribution of 5′ UTR (left panel; n = 6 644; window 50 nt) and 3′ UTR (right panel; n = 5911; window 100 nt) for the predominant SAS and PAS. If multiple SAS or PAS occurred at the same frequency the length of the shortest UTR was used for this histogram. (B) Quantification of multiple PAS used by β-tubulin (left panel) and α-tubulin (right panel). Nucleotides labeled in red indicate PAS. For β-tubulin, additional sequence tags indicated PAS at 318 bp (1 tag), 323 bp (3), 334 bp (1) and 476 bp (1) downstream of the ORF.

Mentions: Using this approach, we were able to identify 10 857 SAS for 6959 genes (Supplementary Table S4). The average number of significant SAS per gene was 1.6 (2.6 before very minor hits were edited out of the curated set). The average 5′ UTR length, based on the predominant SAS when more than one SAS was identified, was 184 bp (median 89 bp), and 80% of 5′ UTRs were shorter than 248 bp (Figure 4A). For most genes, it was impossible to determine whether any SAS were used differentially between life-cycle stages due to limited data for minor SAS sites and for bloodstream-stage cells. In the single TREU 927 sample, 298 SAS were identified for 288 genes that were not represented in the Lister 427 samples, which was presumably due to sequence polymorphisms or to different patterns of gene expression between the two strains. It would have been preferable to be able to analyze data only or mainly from TREU 927, but this was not possible for technical reasons. It would have been ideal for the Lister 427 genome sequence to be available, as this is the most widely used and conveniently manipulated laboratory strain of T. brucei, but we do not yet have that option.Figure 4.


Genome-wide analysis of mRNA abundance in two life-cycle stages of Trypanosoma brucei and identification of splicing and polyadenylation sites.

Siegel TN, Hekstra DR, Wang X, Dewell S, Cross GA - Nucleic Acids Res. (2010)

Length of 5′ and 3′ UTRs. (A) Histogram showing the length distribution of 5′ UTR (left panel; n = 6 644; window 50 nt) and 3′ UTR (right panel; n = 5911; window 100 nt) for the predominant SAS and PAS. If multiple SAS or PAS occurred at the same frequency the length of the shortest UTR was used for this histogram. (B) Quantification of multiple PAS used by β-tubulin (left panel) and α-tubulin (right panel). Nucleotides labeled in red indicate PAS. For β-tubulin, additional sequence tags indicated PAS at 318 bp (1 tag), 323 bp (3), 334 bp (1) and 476 bp (1) downstream of the ORF.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2926603&req=5

Figure 4: Length of 5′ and 3′ UTRs. (A) Histogram showing the length distribution of 5′ UTR (left panel; n = 6 644; window 50 nt) and 3′ UTR (right panel; n = 5911; window 100 nt) for the predominant SAS and PAS. If multiple SAS or PAS occurred at the same frequency the length of the shortest UTR was used for this histogram. (B) Quantification of multiple PAS used by β-tubulin (left panel) and α-tubulin (right panel). Nucleotides labeled in red indicate PAS. For β-tubulin, additional sequence tags indicated PAS at 318 bp (1 tag), 323 bp (3), 334 bp (1) and 476 bp (1) downstream of the ORF.
Mentions: Using this approach, we were able to identify 10 857 SAS for 6959 genes (Supplementary Table S4). The average number of significant SAS per gene was 1.6 (2.6 before very minor hits were edited out of the curated set). The average 5′ UTR length, based on the predominant SAS when more than one SAS was identified, was 184 bp (median 89 bp), and 80% of 5′ UTRs were shorter than 248 bp (Figure 4A). For most genes, it was impossible to determine whether any SAS were used differentially between life-cycle stages due to limited data for minor SAS sites and for bloodstream-stage cells. In the single TREU 927 sample, 298 SAS were identified for 288 genes that were not represented in the Lister 427 samples, which was presumably due to sequence polymorphisms or to different patterns of gene expression between the two strains. It would have been preferable to be able to analyze data only or mainly from TREU 927, but this was not possible for technical reasons. It would have been ideal for the Lister 427 genome sequence to be available, as this is the most widely used and conveniently manipulated laboratory strain of T. brucei, but we do not yet have that option.Figure 4.

Bottom Line: In some cases, alternative SAS would give rise to mRNAs encoding proteins with different N-terminal sequences.We could identify the introns in two genes known to contain them, but found no additional genes with introns.Our study demonstrates the usefulness of the RNA-seq technology to study the transcriptional landscape of an organism whose genome has not been fully annotated.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Molecular Parasitology, The Rockefeller University, New York, NY 10065, USA.

ABSTRACT
Transcription of protein-coding genes in trypanosomes is polycistronic and gene expression is primarily regulated by post-transcriptional mechanisms. Sequence motifs in the untranslated regions regulate mRNA trans-splicing and RNA stability, yet where UTRs begin and end is known for very few genes. We used high-throughput RNA-sequencing to determine the genome-wide steady-state mRNA levels ('transcriptomes') for approximately 90% of the genome in two stages of the Trypanosoma brucei life cycle cultured in vitro. Almost 6% of genes were differentially expressed between the two life-cycle stages. We identified 5' splice-acceptor sites (SAS) and polyadenylation sites (PAS) for 6959 and 5948 genes, respectively. Most genes have between one and three alternative SAS, but PAS are more dispersed. For 488 genes, SAS were identified downstream of the originally assigned initiator ATG, so a subsequent in-frame ATG presumably designates the start of the true coding sequence. In some cases, alternative SAS would give rise to mRNAs encoding proteins with different N-terminal sequences. We could identify the introns in two genes known to contain them, but found no additional genes with introns. Our study demonstrates the usefulness of the RNA-seq technology to study the transcriptional landscape of an organism whose genome has not been fully annotated.

Show MeSH
Related in: MedlinePlus