Limits...
Identification of novel exons and transcribed regions by chimpanzee transcriptome sequencing.

Wetterbom A, Ameur A, Feuk L, Gyllensten U, Cavelier L - Genome Biol. (2010)

Bottom Line: Using stringent criteria for transcription, we identify 12,843 expressed genes, with a majority being found in both tissues.This gene does not appear to be functional in human since one exon is absent from the human genome.Our results extend the chimpanzee gene catalogue with a large number of novel exons and 3' UTRs and thus support the view that mammalian gene annotations are not yet complete.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genetics and Pathology, Rudbeck laboratory, Uppsala University, SE-751 85 Uppsala, Sweden.

ABSTRACT

Background: We profile the chimpanzee transcriptome by using deep sequencing of cDNA from brain and liver, aiming to quantify expression of known genes and to identify novel transcribed regions.

Results: Using stringent criteria for transcription, we identify 12,843 expressed genes, with a majority being found in both tissues. We further identify 9,826 novel transcribed regions that are not overlapping with annotated exons, mRNAs or ESTs. Over 80% of the novel transcribed regions map within or in the vicinity of known genes, and by combining sequencing data with de novo splice predictions we predict several of the novel transcribed regions to be new exons or 3' UTRs. For approximately 350 novel transcribed regions, the corresponding DNA sequence is absent in the human reference genome. The presence of novel transcribed regions in five genes and in one intergenic region is further validated with RT-PCR. Finally, we describe and experimentally validate a putative novel multi-exon gene that belongs to the ATP-cassette transporter gene family. This gene does not appear to be functional in human since one exon is absent from the human genome. In addition to novel exons and UTRs, novel transcribed regions may also stem from different types of noncoding transcripts. We note that expressed repeats and introns from unspliced mRNAs are especially common in our data.

Conclusions: Our results extend the chimpanzee gene catalogue with a large number of novel exons and 3' UTRs and thus support the view that mammalian gene annotations are not yet complete.

Show MeSH
Pearson correlations of expression signals for different sequencing runs. (a,b) The correlation between 35-bp versus 50-bp reads for the datasets brainF and liverF. (c,d) The correlation between the two individuals in brain and liver, respectively. (e,f) The correlation between brain and liver within each individual. Gene expression values were estimated as the depth of coverage per million reads (dcpm), using the last 500 bp of RefSeq genes. The axes in the figures represent log2(dcpm).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2926789&req=5

Figure 2: Pearson correlations of expression signals for different sequencing runs. (a,b) The correlation between 35-bp versus 50-bp reads for the datasets brainF and liverF. (c,d) The correlation between the two individuals in brain and liver, respectively. (e,f) The correlation between brain and liver within each individual. Gene expression values were estimated as the depth of coverage per million reads (dcpm), using the last 500 bp of RefSeq genes. The axes in the figures represent log2(dcpm).

Mentions: To define genes, we used annotations of human and chimpanzee RefSeq genes [28], which are based on alignments of RefSeq RNAs. Gene expression was estimated using the 'average depth of coverage per million reads' (dcpm), as proposed by Hillier et al. [16]. Dcpm is the coverage score normalized for the total number of mapped reads. To avoid the observed 3' bias, expression was estimated only for the last 500 bp of each gene, ensuring that the expression data were comparable between genes of different lengths. Two of the samples, brainF and liverF, were sequenced with different read lengths (35 bp and 50 bp). These technical replicates showed a very high correlation of gene expression levels (Figure 2a,b), demonstrating the reproducibility of the sequencing results. Consequently, we merged the technical replicates to obtain four final datasets: brainF, liverF, brainM and liverM. A higher correlation of transcription levels was seen between identical tissues from the two individuals than between the two different tissues from the same individual (Figure 2c-f).


Identification of novel exons and transcribed regions by chimpanzee transcriptome sequencing.

Wetterbom A, Ameur A, Feuk L, Gyllensten U, Cavelier L - Genome Biol. (2010)

Pearson correlations of expression signals for different sequencing runs. (a,b) The correlation between 35-bp versus 50-bp reads for the datasets brainF and liverF. (c,d) The correlation between the two individuals in brain and liver, respectively. (e,f) The correlation between brain and liver within each individual. Gene expression values were estimated as the depth of coverage per million reads (dcpm), using the last 500 bp of RefSeq genes. The axes in the figures represent log2(dcpm).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2926789&req=5

Figure 2: Pearson correlations of expression signals for different sequencing runs. (a,b) The correlation between 35-bp versus 50-bp reads for the datasets brainF and liverF. (c,d) The correlation between the two individuals in brain and liver, respectively. (e,f) The correlation between brain and liver within each individual. Gene expression values were estimated as the depth of coverage per million reads (dcpm), using the last 500 bp of RefSeq genes. The axes in the figures represent log2(dcpm).
Mentions: To define genes, we used annotations of human and chimpanzee RefSeq genes [28], which are based on alignments of RefSeq RNAs. Gene expression was estimated using the 'average depth of coverage per million reads' (dcpm), as proposed by Hillier et al. [16]. Dcpm is the coverage score normalized for the total number of mapped reads. To avoid the observed 3' bias, expression was estimated only for the last 500 bp of each gene, ensuring that the expression data were comparable between genes of different lengths. Two of the samples, brainF and liverF, were sequenced with different read lengths (35 bp and 50 bp). These technical replicates showed a very high correlation of gene expression levels (Figure 2a,b), demonstrating the reproducibility of the sequencing results. Consequently, we merged the technical replicates to obtain four final datasets: brainF, liverF, brainM and liverM. A higher correlation of transcription levels was seen between identical tissues from the two individuals than between the two different tissues from the same individual (Figure 2c-f).

Bottom Line: Using stringent criteria for transcription, we identify 12,843 expressed genes, with a majority being found in both tissues.This gene does not appear to be functional in human since one exon is absent from the human genome.Our results extend the chimpanzee gene catalogue with a large number of novel exons and 3' UTRs and thus support the view that mammalian gene annotations are not yet complete.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genetics and Pathology, Rudbeck laboratory, Uppsala University, SE-751 85 Uppsala, Sweden.

ABSTRACT

Background: We profile the chimpanzee transcriptome by using deep sequencing of cDNA from brain and liver, aiming to quantify expression of known genes and to identify novel transcribed regions.

Results: Using stringent criteria for transcription, we identify 12,843 expressed genes, with a majority being found in both tissues. We further identify 9,826 novel transcribed regions that are not overlapping with annotated exons, mRNAs or ESTs. Over 80% of the novel transcribed regions map within or in the vicinity of known genes, and by combining sequencing data with de novo splice predictions we predict several of the novel transcribed regions to be new exons or 3' UTRs. For approximately 350 novel transcribed regions, the corresponding DNA sequence is absent in the human reference genome. The presence of novel transcribed regions in five genes and in one intergenic region is further validated with RT-PCR. Finally, we describe and experimentally validate a putative novel multi-exon gene that belongs to the ATP-cassette transporter gene family. This gene does not appear to be functional in human since one exon is absent from the human genome. In addition to novel exons and UTRs, novel transcribed regions may also stem from different types of noncoding transcripts. We note that expressed repeats and introns from unspliced mRNAs are especially common in our data.

Conclusions: Our results extend the chimpanzee gene catalogue with a large number of novel exons and 3' UTRs and thus support the view that mammalian gene annotations are not yet complete.

Show MeSH